Here is an excerpt from an article written by Cathy O’Neil, Jake Appel, and Sam Tyner-Monroe for the MIT Sloan Management Review. To read the complete article, check out others, sign up for email alerts, and obtain subscription information, please click here.
Illustration Credit: Paul Garland
* * *
How do we know whether algorithmic systems are working as intended? A set of simple frameworks can help even nontechnical organizations check the functioning of their AI tools.
Artificial intelligence, large language models (LLMs), and other algorithms are increasingly taking over bureaucratic processes traditionally performed by humans, whether it’s deciding who is worthy of credit, a job, or admission to college, or compiling a year-end review or hospital admission notes.
But how do we know that these systems are working as intended? And who might they be unintentionally harming?
Given the highly sophisticated and stochastic nature of these new technologies, we might throw up our hands at such questions. After all, not even the engineers who build these systems claim to understand them entirely or to know how to predict or control them. But given their ubiquity and the high stakes in many use cases, it is important that we find ways to answer questions about the unintended harms they may cause. In this article, we offer a set of tools for auditing and improving the safety of any algorithm or AI tool, regardless of whether those deploying it understand its inner workings.
Algorithmic auditing is based on a simple idea: Identify failure scenarios for people who might get hurt by an algorithmic system, and figure out how to monitor for them. This approach relies on knowing the complete use case: how the technology is being used, by and for whom, and for what purpose. In other words, each algorithm in each use case requires separate consideration of the ways it can be used for — or against — someone in that scenario.
This applies to LLMs as well, which require an application-specific approach to harm measurement and mitigation. LLMs are complex, but it’s not their technical complexity that makes auditing them a challenge; rather, it’s the myriad use cases to which they are applied. The way forward is to audit how they are applied, one use case at a time, starting with those in which the stakes are highest.
The auditing frameworks we present below require input from diverse stakeholders, including affected communities and domain experts, through inclusive, nontechnical discussions to address the critical questions of who could be harmed and how. Our approach works for any rule-based system that affects stakeholders, including generative AI, big data risk scores, or bureaucratic processes described in a flowchart. This kind of flexibility is important, given how quickly new technologies are being developed and applied.
* * *
Here is a direct link to the complete article.
1. The Ethical Matrix is based on a bioethical framework originally conceived by bioethicist Ben Mepham for the sake of running ethical experiments. For a detailed presentation, see C. O’Neil and H. Gunn, “Near-Term Artificial Intelligence and the Ethical Matrix,” chap. 8 in “Ethics of Artificial Intelligence,” ed. S.M. Laio (New York: Oxford University Press, 2020).
2. C. O’Neil, H. Sargeant, and J. Appel, “Explainable Fairness in Regulatory Algorithmic Auditing,” West Virginia Law Review, forthcoming.