Tag: Featured
-
Deprecating Benchmarks: Criteria and Framework
As AI models rapidly advance, many benchmarks become outdated or flawed yet continue to be used, inflating performance claims and obscuring safety concerns. This paper introduces criteria and a framework for deprecating inadequate benchmarks, with recommendations for developers, policymakers, and governance actors on how to maintain rigorous evaluation standards.
-
Risk Sources and Risk Management Measures in Support of Standards for General-Purpose AI Systems
A comprehensive, public domain catalog of AI risks and safety measures designed to support global AI regulation and standards development. This resource documents risk sources and management measures across the entire AI lifecycle, from development through deployment.
