Classifying AI Systems

By Catherine Aiken and Brian Dunn
Web development by Jennifer Melot

Artificial intelligence governance is a pressing policy issue. AI systems are rapidly being adopted to assist, or independently complete, many tasks. The result is a deserved focus on the safe development and deployment of AI systems. Governments are putting forth AI ethics principles, compiling AI inventories, and mandating AI risk assessments. But these efforts require a standardized approach to classifying the varied types of AI systems in use.

To that end, CSET partnered with the Organization for Economic Cooperation and Development (OECD) AI Policy Observatory and U.S. Department of Homeland Security (DHS) Office of Strategy, Policy, and Plans to develop several frameworks for classifying AI systems. Once defined, these frameworks were tested for clarity and ease of use in a survey, where respondents were asked to classify example AI systems according to the different frameworks. This interactive guide will walk you through the basics of each framework and how respondents fared when classifying example systems using each framework. For more details on the process and methodology, please see the full report and the survey instrument. You can also anonymously try your hand at classifying example systems using the frameworks by clicking "take the survey" below.

The authors would like to thank Nicholas Reese, the OECD.AI working group on the classification of AI, the OECD Secretariat Directorate for Science, Technology, and Innovation, specifically Karine Perset, Louise Hatem, and Luis Aranda, as well as Eish Sumra, Rebecca Gelles, Alex Friedland, Lynne Weil, Dewey Murdick, and Igor Mikolic-Torreira for their guidance and assistance over the course of this research. Thanks as well to those who provided invaluable feedback and editorial support for the full report.

Certain frameworks produced more consistent and accurate classifications.

Higher performing frameworks (C and D) more than doubled the percentage of consistent and accurate classifications, compared to the lowest performing framework (A with no rubric).

Including a summary rubric of framework dimensions improves classification.

We found a significant decrease in consistent and accurate classifications when users were not provided with a rubric when making a system classification, compared to instances when a rubric was provided.

Users were better at classifying an AI system's impact level than autonomy level.

Users consistently assigned the accurate system impact but struggled to consistently assign the accurate level of system autonomy, across all frameworks. Users were also better at classifying system deployment context than technical system characteristics.