Protecting the Data that Fuels AI

May 28, 2021

Recently, I had the privilege of participating in a couple of virtual conferences as a panelist on issues related to artificial intelligence (AI) and intellectual property (IP). Normally when I participate as a speaker at a conference I speak on many of the issues you may have read in our previous blog posts. I tend to focus heavily on patent prosecution issues in obtaining an AI patent, whether in the U.S. or around the globe. However, these two recent panels had a particular focus on how to protect data as an asset. While the issue of protecting data (or information in general) has long been a grey area issue in the realm of IP, this issue has risen to the forefront due to AI and the rise of “Big Data.”

Imagine the scenario that you are a corporation that has invested a lot of resources in obtaining labeled training data that could be used to achieve an AI solution. For instance, you have captured millions of images of objects on or around roadways which are labeled (each image is pre-associated with the identified object such as a “stop sign,” “pedestrian,” “traffic light,” or the like). This data set itself has clear value to the automated driving industry with regards to helping the AI driving the car more accurately recognize objects in the roadway. Perhaps you want to license this dataset to companies developing automated driving technology, or perhaps you want to use it for developing your own automated driving technology. Whatever the case, you need to protect that dataset. What do you do? More specifically, what are the pros and cons of the protection tools available – both practical and legal.

Let’s start with patents, or more succinctly, let’s start by ruling out patents. Without going into a deep dive into decades of case law interpreting what is and what is not patentable subject matter, let’s just all agree that one cannot just claim a data set in its original form. In other words, I cannot just submit my 8 million labeled images of stop signs to the USPTO and get a patent on that collection. Now, if I come up with an ingenious method of collecting those 8 million images…that’s a different story…and a blog post for a later date.

Ok, so what about copyrights? Copyright law may be used to protect original works of authorship, and even computer software. However, in the United States, facts alone are not protected by copyright. In many cases, the data used for AI is merely factual. Yes, in theory, a collection of image data could be works of authorship that are entitled to copyright protection. Realistically though, the image data used for training AI is not originally photographed samples, but is collected from many sources. The “data” that matters in our above example is the combination of the image data and a label describing the image data, which is a factual association. Additionally, the data to be protected is not always image data, but it could be temperature data, traffic volume data, or any number of data items that can be sensed in the Internet of Things.

However, there is an important protection available for data in copyright law – database protection. Databases can be protected by copyright as a compilation if the right conditions are present.[1]

As the Supreme Court put it in Feist Publications v. Rural Telephone:

"Factual compilations... may possess the requisite originality. The compilation author typically chooses which facts to include, in what order to place them, and how to arrange the collected data so that they may be used effectively by readers. These choices as to selection and arrangement, so long as they are made independently by the compiler and entail a minimal degree of creativity, are sufficiently original that Congress may protect such compilations through the copyright laws.”[2]

Therefore, the selection and arrangement of data must be sufficiently original or creative. In other words, it is the arrangement of the data that is receiving the copyright protection, not the factual data. In a future post, we will explore the differences between U.S. database protection, and the Database Directive in Europe which includes a sui generis right of protection which ensures protection of any investment in obtaining, verifying or presenting the contents of a database.[3]

This leads us to trade secrets. In general, a trade secret:

is information that has either actual or potential independent economic value by virtue of not being generally known,
has value to others who cannot legitimately obtain the information, and
is subject to reasonable efforts to maintain its secrecy.[4]

But what does it mean to take “reasonable efforts” to keep the information secret? In the U.S. there is no bright line rule, but it may include a combination of confidentiality and non-disclosure agreements,[5] protective steps beyond normal business practices,[6] proactively seeking return of secret information from departing employees,[7] and the use of encryption.[8]

It can be seen why many owners of Big Data are tempted to go down the trade secret rabbit hole. As we saw in the Google v. Oracle[9] Supreme Court case, if you guess wrong on assuming you have copyright protection, and the data is released to the public, then there is no going back. However, it should be noted that aside from the extra effort required to take reasonable measures to protect a trade secret, trade secret protection also requires a high level of trust in business relationships, such as employer-employee and licensor-licensee. For instance, if you license the data to another party and contractually require the licensee to guard the data, how would you be able to detect if the data was leaked to another entity which then used the data to build a neural network used for AI?

The issues surrounding protection of the data utilized in AI are plentiful and raise urgent concerns for many companies that find themselves in possession of large amounts of such data. This post was just a primer, but we will take a closer look at the different challenges and the different solutions experienced around the globe on this issue in upcoming posts.

[1] https://sco.library.emory.edu/research-data-management/publishing/copyright-data.html

[2] Feist Publications, Inc. v. Rural Telephone Service Co., 499 U. S. 340 (1991)

[3] https://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:31996L0009:EN:HTML

[4] https://www.uspto.gov/ip-policy/trade-secret-policy

[5] Abrasic 90 Inc. v. Weldcote Metals, Inc., 364 F.Supp.3d 888, 899 (N.D. Ill. 2019)

[6] Id. at 900

[7] Id.

[8] Id. at 901

[9]Google LLC v. Oracle America, Inc., Case No. 18-956 (S Ct Apr 5, 2021)