April 30, 2021
In Part I of this series on Disclosing AI Inventions, we discussed the basics of machine learning and the unique disclosure challenges presented by the “black box” nature of trained machine learning models. Nevertheless, current U.S. patent laws are generally viewed as sufficient to ensure adequate disclosure of machine learning inventions to the public, and it will be left to the courts to shape the details of disclosure requirements through interpretation of existing patent laws. In this Part II, we discuss techniques for disclosing machine learning inventions in compliance with the written description and enablement requirements of 35 U.S.C. 112(a).
The test for the sufficiency of the written description is whether the patent disclosure reasonably conveys to those skilled in the art that the inventor had possession of the claimed subject matter as of the filing date. Enablement requires that the invention be described in such a way that allows one skilled in the art to make and use the invention without undue experimentation. Thus, compliance with the written description and enablement requirements is a fact-specific determination that will depend on the type of machine learning improvement that is being claimed in the patent or application.
Machine learning inventions can generally be classified as either “core inventions” or “applied inventions.” Core machine learning inventions are inventions that advance the field of machine learning itself. These improvements typically relate to mathematical or statistical information processing technologies that have general applicability across many different problem domains. Core inventions typically improve aspects of the underlying training algorithm such as the optimization strategy or cost function, or --in the case of artificial neural networks -- provide a new type of network topology having improved performance. These types of AI inventions are commonly conceived through academic research and generally relate to aspects of the machine learning algorithm prior to training by running the algorithm on domain specific data.
By contrast, applied machine learning inventions involve the use of machine learning techniques to solve specific problems in various technical fields apart from AI itself. These inventions typically relate to aspects of training and deploying a machine learning model to perform a particular task and are driving the explosion in patent filings in recent years. This is due to low cost graphics processing units (GPUs) that provide sufficiently powerful computing resources to enable application of machine learning to solve problems across many diverse fields.
The assessment of sufficiency of disclosure will generally involve different considerations for improvements related to core machine learning technology than for improvements related to application of a trained machine learning model to solve problems in other technologies. Thus, the amount of detail and attention given by patent drafters to specific aspects of a machine learning invention will generally depend on whether the improvement is a core invention or an applied invention.
Disclosing Core Machine Learning Inventions
Inventions involving improvements in the underlying training algorithm must be described in such a way to demonstrate that the inventors had possession of the algorithm at the time of filing and to enable one of ordinary skill in the art to make or use the algorithm without undue experimentation. As discussed in Part I, the algorithm is the automated training process that creates the trained model optimized to perform a specific task. As such, the specification should disclose core improvements in the underlying algorithm by describing the automated steps of the optimization process and providing details of those steps that are responsible for the improved performance. This is typically done using descriptive tools that have become familiar to those disclosing conventional software inventions, such as flow charts, pseudocode and portions of source code, if available. Mathematical or statistical formulas should also be used as needed, for example, to describe any novel activation functions or loss functions that improve performance of the algorithm.
Core inventions may also lie in unique architectures of models to be trained by the algorithm. For example, new types of neural network topologies may provide improved efficiencies in the training process or improved accuracy in output results that are applicable across a wide range of problem domains. Here, the new structural framework of the network must be described in sufficient detail to demonstrate possession of the invention and to enable one of ordinary skill in the art to reproduce the network topology. This can be done using annotated structural diagrams that describe the individual nodes, the arrangement and function of node layers, and the interconnectivity between such layers in a neural network.
In contrast to the algorithm and architecture of core inventions, extensive description of the training procedures, the trained model, and model output results in a specific technical area may not be needed to meet disclosure requirements for core machine learning inventions. Since core improvements are generally independent of any problem domain, thorough description of domain specific training data and the collection and cleaning of such data is unlikely to enable one skilled in the art to make and use a core invention. Similarly, detailed description of the manual tuning of hyper parameters during a particular training session and the integration of a trained model into a larger system are of less importance when disclosing core inventions. In this regard, the difficulty in explaining the inner workings of a trained model that has raised the “black box” disclosure concerns will typically be avoided when disclosing core inventions.
Despite the emphasis on describing pre-training aspects for core machine learning inventions, some level of description of the training aspects and model output results must be included in the patent disclosure to support functional claims that capture the broad applicability of core inventions across different technical fields. The USPTO has recently emphasized that its disclosure guidelines for functionally claimed computer implemented inventions are also applicable to AI-related inventions. These guidelines explain that a compliant disclosure must describe not only the computer and algorithm used to perform the claimed function, but also provide sufficient examples to demonstrate possession and enablement of the full scope of the functionally claimed invention. Thus, the disclosure of core machine learning inventions should include several example implementations of the novel algorithm or architecture in various data domains to support broad claims of improved training efficiency or model performance across different technical areas.
Disclosing Applied Machine Learning Inventions
The trained model is at the center of applied machine learning inventions and presents the greatest challenges to patent disclosure because the inner workings of the model are not fully understood or easily explained. As discussed in Part I of this series, the predictive capabilities of a trained model are captured by the statistical weighting values embedded in the model. Even if these numerical weightings are explicitly disclosed in a patent, they are unlikely to support the full scope a claimed invention because such numerical values have little meaning to even experts and cannot be reproduced due to the randomness associated with the optimizations performed by the training algorithm. In view of this black box aspect of applied machine learning inventions, a thorough and detailed description of the training data, training procedures, model output results, and system integration of the trained model must serve as the basis for demonstrating that the inventor had possession of the trained model, and for enabling one of ordinary skill in the art to make or use the trained model without undue experimentation.
Disclosure of the training data for applied inventions can range from explicit disclosure of complete datasets to description of the characteristics of the dataset. Explicit disclosure of the full dataset may present legal and practical problems for patent drafters. For example, it is often necessary to exclude training data from a patent specification due to privacy laws or the proprietary nature of such data. Even if permissible to disclose the full dataset, it is often not practical given the massive amount of data used to train modern deep learning networks. Therefore, a description of the source and quantity of the training data, any preprocessing or cleaning of the training data, and detailed characteristics of the training data should be included to aid in sufficiently disclosing applied machine learning inventions.
For example, the predictive or classification capabilities of a trained model can often be reproduced by one skilled in the art from adequate disclosure of the correlation between the input data and output results of the trained model. In supervised machine learning techniques, this relationship is somewhat known to domain experts in advance, and is used as the basis for specifying the feature set and output labels for the training samples. In other circumstances, relationships between inputs and outputs may be discovered from feature extractions performed by the training algorithm and/or trial-and-error adjustments to the training data by the experts when building and training a model. Whether known in advance or learned from the training process, a detailed description of the known input features and output possibilities, as well as any known input/output correlation should be provided in the specification as a basis for demonstrating possession and enablement of an accurate trained model. In situations where it is difficult to describe the input/output correlations even after the training process, the relationships can be demonstrated by test results that verify the prediction or classification capabilities of the trained model. It may be prudent to describe any known correlations and also provide test results to bolster the sufficiency of disclosure for these inventions.
A thorough description of any human intervention in the training process can also assist in the sufficient disclosure of applied machine learning inventions. Specifically, the intuition involved with selecting and tuning of hyper parameters during the training process is often based on domain expertise that leads to a trained model with improved performance. For example, configuration of conventional neural nodes, sequence of hidden layers of known functionality, specification of the learning rate, or a combination of these training choices may be the basis for improved performance of the trained model and should be described in the specification. Indeed, the seemingly routine selection of hyper parameters when running an off-the-shelf algorithm on domain specific data may produce a new type of network architecture having the broader applicability of a core machine learning invention. Similarly, integration of a trained model into specialized equipment deployed for a specific task may support disclosure of an applied invention. These training and deployment details may have even greater importance in the disclosure of applied inventions for which the relationship between input and output data is largely unknown.
A patent drafter should err on the side of comprehensive disclosure of all of the above-noted aspects of applied machine learning inventions in order to ensure reproducing the trained model does not require “undue experimentation.” In applying the familiar In re Wands factors to the question of enablement, the Federal Circuit has made clear that a considerable amount of experimentation may be necessary as long as the experimentation is merely routine and the specification provides a reasonable amount of guidance with respect to the direction in which the experimentation should proceed. Thus, the goal in disclosing applied machine learning inventions should be to mitigate the inability to explain the model itself by guiding one skilled in the art with a thorough description of the training data, training procedures and output results that permits reproduction of the model by only routine experimentation.
Despite the emphasis on training aspects and model output results for applied machine learning inventions, disclosure of the core technology involved is also required, particularly to support functionally claimed inventions. As noted above, USPTO disclosure guidelines require description of the algorithm that performs the claimed function or result. Thus, the patent specification should include a basic description of the training algorithm and architecture of the trained model. With applied inventions, this may involve simply naming off-the-shelf algorithms or well-known network architectures. However, it may also be necessary to provide several examples of various known algorithms and architectures implementing the domain specific data and training improvements in order to support the full scope of functional or results-oriented claims to the applied machine learning invention.
ConclusionRapid advances in machine learning technology have presented challenges for disclosing AI inventions. Although familiar disclosure tools of conventional software patents are available to describe core machine learning inventions, patent drafters will need to develop new disclosure techniques for applied machine learning inventions in view of the black box nature of trained models at the center of these inventions. Disclosure of the underlying training algorithm coupled with a comprehensive description of the training data, human input to the training process, and system integration of trained models will be useful to meet disclosure requirements for applied machine learning inventions.