We read about it every day. A lawyer uses a large language model (LLM) to do some research. They copy that research into a brief, but the research contains cases that don’t exist. The lawyer is busted, the judge furious, and the client starts looking for a better lawyer.

It has everyone scratching their heads. I mean, everyone knows the AI systems will do this, so why does it keep happening? A new Cornell University study and paper sheds some light on this, the problem of overreliance, and why the volcano of serious AI flaws may be about to erupt. Quite simply, the cost of verifying the results of the AI tools exceeds any savings from their use. It’s a paradox.

In Part I of an examination of the why a volcano of AI problems may be about to erupt, I looked at the dangers of overreliance on AI given the gaps in the underlying infrastructure. But there’s more to the story. The simple fact is that AI tools have fundamental reality and transparency flaws is risky and downright foolhardy. Given the profound and breadth of the impact of these flaws and the corresponding cost to verify outputs, the use and role of AI in legal may end up being more limited than many think.

The Assumptions

As pointed out in the study, the assumption fueling the explosion of AI use in legal is that will save gobs of time. This savings will inure to the benefit of lawyers and clients, will lead to fairer methods of billing like alternative fee structures, will get better results, improve access to justice, and lead to world peace. Well, maybe even the vendors would not go so far as to guarantee the last one. But vendors do seem to be guaranteeing everything but that. And pundits talk as if AI will transform legal from the ground up. Law firms are buying into the hype, investing in expensive systems that do things they barely understand. 

But not so fast. All this hinges on the assumption that the time saved will vastly exceed the additional steps needed to verify the output and that any issues of AI with things like accuracy will soon be solved. 

The Cornell study throws some cold water on all these assumptions and challenges them head on.

The Cornell Study

The study identifies two fundamental LLM flaws. The first we all know about: the propensity of the systems to hallucinate and provide inaccurate information. The study refers to this flaw as a reality flaw. It’s a big problem in a profession like law where being wrong can have severe consequences. The second flaw identified by the study it calls a transparency one. We don’t really know how these systems work.

The reality flaw, says the study, stems from the fact that generative systems “are not structurally linked to reality: namely factual accuracy…a machine learning model does not learn the facts underlying the training data but reduces that data to patterns which it then ingests and seeks to reproduce.” And the study notes that it’s not just the public systems like ChatGPT that demonstrate this flaw, it’s also the ones built for legal as well.

So, the study concludes, “any output generated by AI must be verified if the user wishes to satisfy themselves as to the accuracy and connection to reality, of that output—especially in legal practice.” In other words, check your cites.

The second flaw, one of transparency, is the black box problem. It in turn creates a trust issue, says the study. If you don’t know how a decision is made or a conclusion is reached, how can you trust it? 

For a legal system that depends on reasoning and logic, that’s a big issue. I would phrase it this way: how can you rely on something when you don’t know how it works, how it reached the decision it reached, and you get different answers to the same questions.

Use of AI in legal hinges on the need to be able to explain how a decision was reached. That’s a cornerstone of how legal processes and even the rule of law is based.

The study further concludes that neither of these flaws will be overcome anytime soon.

What Does This Mean?

The study goes on to talk about what this means. It suggests that the plethora of cases where lawyers have failed to check cites and end up having a hallucinated or inaccurate case or facts recited in filings means lawyers are underestimating the flaws. Or have been convinced by providers that the risks are negligible. 

These lawyers have simply overrelied on a tool they believed or were led or lulled into believing was more accurate than what it is. The result so far has been a great hue and cry by everyone that you have to check cites. Usually this is delivered with a wry grin that says it’s just dumb and lazy lawyers to blame. But the fact is the problem is not going away. In fact, it seems to be getting worse.

It may be that the guilty lawyers are dumb or lazy, although as I have written before, that’s not the whole story. But what’s left unsaid is something the study points out: “the net value of an AI model in legal practice can only be assessed once the efficiency gain (savings on time, salary costs, firm resource costs, etc.) is offset by the corresponding verification cost (cost to manually verify AI outputs for accuracy, completeness, relevance, etc.). Those caught with hallucinated cases in their papers simply didn’t take the time to verify relying on the AI tool.

Because the demand for accuracy in legal is so high, the study notes, the verification cost for many actions in legal is too high to offset the savings. The study also concludes that this cost is not ameliorated by automated systems since the reality and transparency risks may still exist. Hence what the study calls a verification paradox.

And we see the impact of this paradox already with fines imposed by courts for hallucinated cases. We will no doubt see malpractice and ethical violation claims. The cost of being wrong in law is just too great to not verify and verify thoroughly. 

Granted, AI can do lots of things well where the risks of being wrong are not that great. It will have an enormous impact in business and maybe other professions. But for law, not so much: “The more important the output, the more important it is to verify its accuracy.”

The study concludes:

The verification-value paradox suggests the net value of AI to legal practice is grossly overestimated, due to an underestimation of the verification cost. A proper understanding of the costly and essential nature of verification leads to the conclusion that AI’s net value will often be negligible in legal practice: that is, in most cases, the value added will not be sufficient to justify the corresponding verification cost.

The Reality

It’s easy to see the economic impact of the verification paradox when you compare the cost of getting a piece of work done by an LLM with that done by a human. Let’s assume you ask an LLM to do some legal research that would normally take you 10 hours. You get the result, but it’s got some 25 case citations. Now you have to a) check to make sure every case exists and b) make sure that the case stands for the proposition the LLM says it does. By the time you do that, you could very well spend the eight hours, if not more.

Volcano About to Erupt?

It may be too late to completely put AI back in the bottle. But where it takes just as long if not longer to verify the results of an AI tool you’ve spent thousands of dollars on, you’re not going be predisposed to buy more. Certainly, your clients aren’t going to be wild about your use of a tool that not only fails to save them money but costs them more and exposes them to risk.

It’s easy to envision the fundamental conclusion that using AI for many things is not worth the risk and the cost of validating its result. It’s easy to see how this fact will temper the enthusiasm and reliance on AI. 

We may rapidly conclude the costs and risks of doing so are too high and simply not worth it in the long- and perhaps even the short-run. When that happens, a lot of lawyers are going to be caught with expensive systems that they don’t need. A lot of vendors may have to go in other directions. A lot of venture capital may go down the drain. The proverbial volcano may be about to erupt.

That’s something worth considering before you buy the next shiny new AI toy and before you use AI shortcuts to do the hard work, before you blindly expect people you supervise to do the right thing and before you accept without question their work.

In the meantime, check your citations. Please.


Stephen Embry is a lawyer, speaker, blogger, and writer. He publishes TechLaw Crossroads, a blog devoted to the examination of the tension between technology, the law, and the practice of law

Melissa Rogozinski is CEO of the RPC Round Table and RPC Strategies, LLC, a marketing and advertising firm in Miami, FL. 

The post Like Lawyers In Pompeii: Is Legal Ignoring The Coming AI Crisis? (Part II) appeared first on Above the Law.