Canadian Courts and Generative AI: Broadening Our Gaze to Potential Corrosive Risks

Canadian courts are concerned about AI. They are concerned that litigants may misuse AI and attempt to introduce fake cases or deepfake evidence into court proceedings. They are also concerned about AI potentially usurping judges’ decision-making role and thereby undermining the proper administration of justice.

These concerns don’t merely exist in the ether – they have become embedded in roughly a dozen AI practice directions or notices from individual Canadian courts addressing the use of AI by litigants (see, e.g. here and here), as well as in recent AI guidance from the Canadian Judicial Council (CJC) and the Action Committee on Modernizing Court Operations. There have also been civil rule amendments aiming to provide safeguards for the authenticity of submissions and evidence in an AI era (see, this interview for a bit more discussion). And, on the legislative front, proposed amendments to federal Bill C-27 identify the use of AI by courts to make determinations in proceedings as a “high impact” use and, therefore, subject to more stringent requirements.

These concerns are valid. False confidence in ChatGPT as a legal research tool has resulted in fake cases being submitted to courts across the world. In Canada, fake cases popped up in a British Columbia case in December 2023. On the horizon, there is potential for a wave of deepfake evidence to enter our courts as increasingly accessible and powerful tools can, for example, clone voices off of short audio samples and create relatively convincing synthetic video avatars of people. To state the obvious, litigants submitting fake evidence or fake cases to courts is disruptive at best and dangerous at worst. Courts must render judgments based on actual facts and law, not fictional facts and law. Even if fake evidence or cases are caught before making their way into a judgment, significant court resources may be consumed in adjudicating fiction from reality. Courts are right to try to prevent fake inputs from entering proceedings.

Courts also have reason to be cautious when it comes to whether and how they should use AI themselves. Court use of AI has, in the past, involved biased technology – the COMPAS recidivism risk assessment tool that was deployed in criminal proceedings in the United States and the Pro Publica expose of its discriminatory outputs are top of mind to many. Indeed, the explanatory note from the Ministry of Innovation, Science and Industry regarding why certain court uses of AI should be deemed high impact in Bill C-27 gave, as a solitary example, the fact that “AI systems that provide assessments of risk of recidivism based on historical data have been demonstrated to unintentionally perpetuate biases.” Concerns about bias and other potential transgressions to justice system values – like, for example, the integrity of court proceedings and judicial independence – have led the CJC to state in its guidance that “no judge is permitted to delegate decision-making authority, whether to a law clerk, administrative assistant, or computer program, regardless of their capabilities.”

That Canadian courts have identified fake inputs and risks arising from delegating decision-making to AI (including biased outputs) as matters of concern is positive. There is reason to be worried about these things and to continue thinking about how we might best respond. It is a very dynamic time, and we will need to iterate and evolve our responses for a while.

In my view, we also need to broaden our focus. What if we look further beyond fake inputs and the drawing of bright, prohibitive lines around delegating judicial decision-making to AI? What might be some of the more subtle risks that AI (and, in particular, generative AI) might pose for the work of courts? Have we been too quick to largely ignore or label certain court uses of generative AI as low-risk? These are questions I’m increasingly interested in and which I think are increasingly pressing.

Generative AI’s potential corrosive risks

My focus in this column will be on some of the more subtle ways that generative AI risks undermining confidence in our justice system and introducing error and bias into our court processes. I characterize these as generative AI’s potential corrosive risks. The possible harms arising from the risks I canvass below aren’t as immediately dramatic or legible as the harms resulting from AI generating fake cases/evidence or from introducing AI systems that take over the judicial decision-making function. The claim here is, however, that these corrosive risks have the potential to create a “rust” which may, particularly as it collects and accumulates, have some very negative effects on our justice system.

In the following two sub-sections, I consider examples of potential corrosive risks arising from generative AI in relation to the work of courts in the areas of fact-finding and law-applying/law-making.

Fact-finding

AI Transcription – One commonly cited use case for AI in the courtroom is transcription. For example, the Action Committee’s guidance notes that AI “could be used to quickly produce unofficial transcripts of court processes” and that this could “increase efficiency and accuracy and reduce delays in [court] processes.” It is my understanding that some Canadian courts and tribunals are already experimenting with using generative AI tools to prepare unofficial transcripts. This all sounds sensible at first blush. And indeed, it may very well be sensible to use AI for court transcription. But the devil lies in the details.

Like many, I was disturbed by recent reporting about the “widespread use” of a generative AI transcription tool by medical professionals notwithstanding evidence that it “regularly invents text that speakers never said”. In law, as in medicine, the precise things that people say can really matter. Changing a few words in testimony can be a recipe for an incorrect legal verdict, just as changing a few words in a medical interview can lead to an incorrect diagnosis.

A professional reviewing a generative AI-created transcript after the fact may be able to catch gross errors but more subtle errors or inconsistencies risk going undetected. An exaggerated hypothetical that I’ve given previously: a doctor will remember whether the patient said they saw an alien last month, but they might not remember whether the patient said that their pain started in their foot or calf. In the court context, a good safeguard would be for a judge to double-check the audio recording or an official transcript not generated by AI if they intend to rely on a particular piece of testimony in their judgment. But, without a clear awareness of the fallibility of some of these tools, there may be a tendency for judges to over-rely and not take this extra step. Moreover, errors in parts of the testimony not directly relied upon by a judge – and therefore not double-checked – might still impact a judge’s general view of a witness or the case.

Of course, not all generative AI transcription tools are the same and not all transcription tools are generative AI. The point here is not that all AI transcription is bad. Rather, the point is to caution that what may appear on its face to be a benign or merely “assistive” use of generative AI may, in fact, require, significant scrutiny and specific safeguards. It seems that some hospitals have been too quick to integrate AI transcription tools without conducting proper due diligence – courts have an opportunity to avoid making the same mistake.

AI-Generated Summaries of Facts- I’ve heard it suggested that judges might use generative AI tools – not to make decisions or to state the law – but to help with creating factual summaries that may then be used, either in whole or part, in the text of a legal decision. Like with transcription, the question of hallucinations infecting the summaries is live. But we might also worry about textual or sub-textual bias. For example, studies show how generative AI can sneak subtly biased language into reference letters:

“While ChatGPT deployed nouns such as ‘expert and ‘integrity’ for men, it was more likely to call women a ‘beauty’ or ‘delight.’….Men were ‘respectful,’ ‘reputable’ and ‘authentic,’ according to ChatGPT, while women were ‘stunning,’ ‘warm’ and ‘emotional’.”

The point here is that generative AI tools have particular tendencies that will influence the descriptive language they produce and that these tendencies can conflict with the proper administration of justice. The possibility of generating biased language is one obvious example. This is another illustration of how certain use cases – even if they seem “merely assistive” and do not supplant judicial decision-making – can be in tension with the proper administration of justice. And, in this case, the subtle nature of some of the problematic language that might be injected isn’t necessarily easily amendable to an “add human oversight and stir” fix.

Law-making/law-applying

AI-Powered Legal Research – This is another area where the spectre of subtle errors looms large. When used for legal research, generative AI tools can not only produce fake cases but can also inject other forms of erroneous content into outputs. As I wrote previously in Slaw:

In an article titled “Hallucination is the last thing you need”, the authors warn of “common law contamination with subtle, non-obvious errors”. Totally fake cases should be easy for courts to spot, but what about smaller changes in the words of a quoted passage? In the legal field, where so much depends on the particular words used, the potential for this sort of “contamination” is concerning. For example, an AI tool might substitute the word “fair” for the word “reasonable”. This change may make little or no difference in non-legal contexts but could have unintended but dramatic effects in law – a court applying a “fair person” standard of care in a negligence case may well reach a different result than one applying a “reasonable person” standard.

The prospect that generative AI tools may produce subtle legal errors—something that I have, in fact, experienced first-hand even when using tools specifically designed for legal research—should be on our radar just as much as fake cases. Incorrect words or wrongly stated legal tests may inadvertently make their way into legal decisions and impact the outcome of disputes, as well as undermine the development of the common law. Moreover, subtle legal errors are likely to be much harder to catch when verifying outputs than the insertion of fake case citations.

There is a lively debate about the performance of current versions of generative AI legal research tools (see, for example, here for a discussion about the Canadian version of Lexis+ AI). However, no one seems to dispute that these tools can sometimes produce inaccurate outputs. This reality, coupled with the fact we do not yet have a comprehensive account of the nature and scope of the inaccuracies that these tools produce, warrants a high level of caution when contemplating court adoption.

Ethical AlignmentEven if we get to a point where we are fully confident in the accuracy of AI legal research tools (or are even simply much more confident in their accuracy), we may still be worried about the issue of ethical alignment. What priorities are encoded into those tools by private developers and how those are influencing the law? An AI legal research tool that is designed to give users legal answers as opposed to just a list of cases is going to have a lot of internal – most likely hidden – instructions (i.e. system prompts) about how it should get to an answer. It may, for example, contain built-in instructions about what precedents and secondary sources to favour, what follow up questions (if any) to ask the user or to allow the user to ask, how much detail to provide in the output, which details should be emphasized in an output, and what language or tone to use.

All these instructions reflect choices embedded into the tool by private developers – choices which may or may not be aligned with judicial values or the public interest. And all these choices have the potential to shape our understanding of what the law is on a given topic and how that law is best described. There is again a risk for subtle, and possibly undesirable, effects on the work of courts and development of the common law.

Conclusion: looking ahead

The potential for generative AI to result in the sort of corrosion discussed in this column demands our attention. To be sure, the types of risks identified above can be harder to articulate and grab on to than a fake citation or an alienated robo-judge. But, in order for our courts to fulfill their important role in our democracy, they need to have healthy fact-finding and law-making/law-applying processes. If courts start misstating relevant facts or the law, even in subtle ways, or if the work of the courts starts becoming overly influenced by choices of private legal tech developers, the administration of justice and the public’s confidence in courts risks being undermined.

So, what to do? The answer isn’t, in my mind, for courts to avoid exploring how generative AI technology might improve the administration of justice. This technology is still new and we don’t yet have a clear and full picture of how we might productively and ethically engage with it. What we need to do, however, is move forward cautiously and with our eyes fully open.

I’ll end with three very quick suggestions. First, building awareness about some of the more subtle risks of generative AI discussed above should be a priority. Awareness is particularly important given the low barriers to access for many AI tools and the prospect for “shadow AI” when it comes to assistive uses (i.e. the potential that some individual judges may use certain AI tools even if they have not been formally vetted or approved).

Second, the concerns raised above suggest that significant caution is warranted when it comes to court use of generative AI even with use cases that do not directly intersect with judicial decision-making. There can be a tendency to view “assistive” or “internal” uses of generative AI by the courts as low-risk and therefore not warranting deep scrutiny. Courts ignore the risks of such use cases at their peril.

Finally, the discussion here links to a broader conversation about the need for more transparency and better benchmarks when it comes to generative AI tools and issues of bias and accuracy. We need a clearer understanding of how these tools work, what their limitations are, and what sorts of preferences or priorities are built in.

The post Canadian Courts and Generative AI: Broadening Our Gaze to Potential Corrosive Risks appeared first on Slaw.

Related Posts