Application of AI in Litigation Services

August 27, 2021

As a follow up to the June 11, 2021 post entitled “Application of AI in Legal Services,” this post examines the use of artificial intelligence (AI) in an additional legal area -- eDiscovery in litigation. One of the many time-consuming tasks in litigation is the review of documents to assess whether (a) a document is relevant to any outstanding discovery request or mandatory disclosure, and, if so, (b) if the document is covered by at least one factor that would prevent its disclosure (e.g., due to attorney work product, attorney-client privilege, spousal-privilege, and/or another privilege). Furthermore, document review often is performed with an eye toward flagging certain documents as being important to an issue in the case, either as part of a defense or as part of a party’s case in chief. 

Given the increasing amounts of information produced and stored by companies, lawyers are turning to automated review processes for documents that not only include scanning, deduplicating, and tagging documents, but now include the aid of AI tools that perform “predictive coding” of the documents for any number of issues. Indeed, some courts are explicitly allowing the use of predictive coding on the condition that its use is disclosed to all parties. For instance, the Court in Seagen Inc. v. Daiichi Sankyo Co., Ltd stated the following.

To contain costs in the identification of non-duplicative relevant ESI from the email collections of identified custodians for review and production, the Parties may use advanced search and retrieval technologies, including e mail threading, predictive coding, or other technology-assisted review, given the use of such technology is disclosed to the other Party prior to its use. In the case of a Party’s disclosure of the intended use of predictive coding or technology assisted review, within seven days of disclosure, the other Party is permitted to request information regarding the confidence metrics to be used. The Parties agree to meet and confer regarding the specific analytics and confidence metrics that will be exchanged and a schedule for the exchange.[1]

Predictive encoding allows text-based documents and/or documents that have been converted to text (via an optical character recognition technique) to be semi-automatically classified into corresponding categories (e.g., relevant, privileged, work-product, damages-related, infringement-related, design-around-related, inventorship-related, and reduction to practice-related) by using an AI-based classification system. To do so, some platforms initially identify the categories that are to be used when classifying documents and select a small subset of the documents (e.g., randomly, programmatically, or manually) that are to be selected as the initial training documents. Those documents often are then manually classified as to which of the categories they belong, if any. Once classified, the initial documents are used to train one or more initial AI models that provide an initial capability to predict whether future documents belong to the various categories. 

After the initial models are generated, further batches of documents typically are then applied to the initial AI models to provide an initial check of the batches to see how well the initial model is performing. By users agreeing or disagreeing with the categorizations of the initial checks of the batches, the initial models further can be refined to better learn to classify the documents correctly as belonging or not belonging to a particular category. As an increasing number of batches is applied to each refined model, the refined model should provide an increasingly accurate prediction. Once the prediction of the refined models on a sufficient number of batches meets a specified confidence metric, the system generates final models for each of the categories. All the remaining documents can be classified using the final models to produce a reviewed document set. This classification methodology also has the added advantage of potentially being able to help a lawyer find documents related to a selected classification that might not have been found by a traditional word search.

No system, however, is perfect, and the classification of the documents classified by the final models should be reviewed to verify that the final models accurately performed that classification – especially where Attorney-Client Privilege and Work Product documents are being classified. In a future post we will examine clawback agreements between parties dealing with the ability to have inadvertently produced documents covered Attorney-Client Privilege and Work Product returned under FRE 502. 

[1] See Order Regarding E-Discovery, Seagen Inc. v. Daiichi Sankyo Co., Ltd., 2:20-cv-00336 (EDTX), Docket Entry 65, paragraph 11.