
An AI-based system typically comprises several supporting components alongside the core AI model. Each component, as well as the AI model itself, should undergo individual and collective evaluation for intellectual property (IP) protection.
The simplified block diagram below depicts an exemplary AI-based system. The depicted system includes an implemented AI model and supporting system components that may be considered for some form of intellectual property protection. Real-world embodiments of such systems will have different configurations and include additional elements. However, this simplified presentation facilitates an initial discussion of an AI-based system in the context of intellectual property.

The diagram shows that the system components include training data, data preprocessing, model training, the trained model, trained model execution, model output, and model refinement.
Training Data
Training data can consist of text, images, audio, video, and structured data from various sources, including social media, online articles, books, scientific papers, and databases. This data is meticulously collected, annotated, and processed to teach AI models to understand, interpret, and generate human-like responses or to perform specific tasks such as image recognition, language translation, or predictive analysis. The quality and quantity of training data are crucial to the performance and accuracy of AI models. High-quality training data should be representative, unbiased, diverse, and sufficiently large to cover the complexity and nuances of the real world.
While training data itself cannot be patented, innovative methods of collecting, processing, configuring, or using training data for AI models may be patentable. Also, training data collected and compiled into datasets used to train proprietary AI models may be considered a trade secret.
Moreover, copyright may protect any source of training data, especially those involving creative works like text, images, and videos.
Data Preprocessing
Data preprocessing comprises a series of operations that convert raw data into a suitable format for effectively training AI models. The objective of data preprocessing is to improve the quality of the training data, making it more appropriate for modeling and enhancing the AI model's performance by addressing issues such as missing values, noise, and inconsistencies in the training data. The operations comprising data preprocessing may include the following:
Cleaning: Removes inaccuracies and corrects the data to eliminate errors and inconsistencies.
Normalization and Standardization: Scale numeric data to fit within a specific range or ensure it has a particular distribution, ensuring feature comparisons are done on a standardized scale.
Feature Extraction and Selection: Identifies and selects the most relevant features for training the AI model. A feature is a specific, measurable aspect of the data or a function of it. It helps reduce the data's dimensionality and improve model efficiency.
Data Augmentation: Creates synthetic data from the existing dataset to increase its size and variability. This process is especially beneficial for image and speech recognition data.
Encoding and Tokenization: Converts categorical data into numerical values and breaks down text into tokens or symbols that a model can use.
Handling Missing Values: Ensures the dataset's integrity using imputation or data removal.
Innovative techniques for data preprocessing, such as novel feature extraction or data augmentation methods, may be patentable.
Alternatively, methods and algorithms developed for preprocessing data may constitute trade secrets, especially if they offer a competitive advantage by enhancing data quality or model performance.
Training Methods
Various techniques and algorithms enable AI models to make predictions or decisions based on input data. These techniques and algorithms are categorized based on the nature of the learning process and the data type used, including the following:
Supervised Learning: This method trains a model on a labeled dataset, where each example in the training set is paired with the correct output. The model learns to predict the output based on the input data. This approach is widely used for classification and regression tasks.
Unsupervised Learning: Unlike supervised learning, this method works with datasets lacking a paired response. The goal is to identify patterns, clusters, or relationships within this data, making it useful for cluster analysis, association mining, and dimensionality reduction.
Semi-supervised Learning: This method utilizes labeled and unlabeled data for training, bridging supervised and unsupervised learning. It is particularly beneficial when labeling data is prohibitively costly or time-consuming and allows for leveraging larger datasets.
Reinforcement Learning: This method employs an agent that learns to make decisions through interaction with an environment while attempting to achieve a goal. The agent is guided by rewards or penalties for its actions, steering it toward the optimal strategy to achieve the goal. It is used in robotics, games, autonomous vehicles, and similar technologies.
Transfer Learning: This method involves taking a model trained for one task and further training that same model for a second related task. It is particularly effective when the second task has limited data, as it leverages the knowledge from the first task to enhance performance.
Federated Learning: This method uses a distributed approach that trains models across multiple devices or servers, each holding local data samples without exchanging them. This method enhances privacy and minimizes the need for data centralization.
New and non-obvious algorithms, architectures, or methodologies for training an AI model may qualify for patent protection. This includes those methodologies that define specific training parameters.
Alternatively, training methods that provide a business advantage and are not publicly known may be protected as a trade secret. These trade secrets may include proprietary training methods, model training parameters, and specialized datasets.
Trained Models
AI models vary widely, each designed to address specific tasks or problems. These models can be broadly categorized based on their learning methods, functionality, or the type of data they process. Below is an example of standard AI model categorization.
Rule-based System: This system type relies on predefined rules, such as if-then statements. This system type is most suitable for tasks with clear and well-defined rules.
Machine Learning Models: These AI models focus on developing algorithms that learn from large amounts of training data. They are often categorized based on their training methods, including supervised, unsupervised, semi-supervised, and reinforcement learning models.
Deep Learning Models: These models implement multi-layered artificial neural networks to learn from hierarchical data representations. These models may be based on various types of neural networks, including feed-forward neural networks, convolutional neural networks, recurrent neural networks, generative adversarial networks, autoencoders, and transformers.
Natural Language Processing Models: These models are a type of deep learning model designed to understand and generate human language. They are used for language-based tasks such as text classification, sentiment analysis, translation, and chatbot development.
Computer Vision Models: These models are another type of deep learning model designed to interpret and understand visual information from images and videos, often relying on convolutional neural networks.
Generative Models: These models are another type of deep learning model designed to create new data instances resembling a specific dataset, often using variational autoencoders and generative adversarial networks.
Copyright protection should be considered for portions of the code implementing AI models, including custom neural network architectures and proprietary software for training and deployment.
Trade secrets should be considered for confidential AI models, algorithms, and datasets.
For innovative AI model architectures, training algorithms, or novel applications, patent protection should be considered.
Executing a Trained AI Model
Running a trained AI model accurately and efficiently involves several key steps:
(1) Loading the model with its corresponding weights and architectural configuration into an application or service,
(2) Processing input data to make predictions or inferences,
(3) Interpreting or transforming the raw output data into a usable format, and
(4) Monitoring the model's performance to ensure it operates as expected.
Novel methods designed to enhance the execution speed, improve output quality, or broaden the types of input data an AI model can handle may be eligible for patent protection.
Model Output
Copyright and trade secret protection should be considered if the AI model output consists primarily of informative data.
If the AI model output is something akin to a product specification, a chemical structure, or an algorithm's code, patent protection should be considered, provided there is substantial human interaction.
Refinement of Trained AI Models
Refinement may enhance a model's performance, efficiency, and generalization capabilities and, therefore, can be a critical phase in the machine learning lifecycle. Refinement may include steps and techniques applied after the initial training to help improve the model. It becomes necessary when the model fails to meet performance expectations, needs adaptation to new data or tasks, or faces issues such as overfitting, underfitting, or computational inefficiency.
Patent protection may be considered for refinement methods and techniques that may be new and non-obvious.
A Combination of Elements
As with many inventions, while the individual components of an AI-based system may not be patentable, the combination of these individual elements may be novel and non-obvious.
How TCP Law Can Help
A TCP Law patent attorney can help you prepare, file, and prosecute your artificial intelligence (AI) or machine learning (ML) based utility patent application.
For assistance with protecting AI-related inventions or with any patent issue, please contact TCP Law at info@tcplawfirm.com or 917-612-1059.
Comments