Tag: Featured

Deprecating Benchmarks: Criteria and Framework

July 8, 2025

As AI models rapidly advance, many benchmarks become outdated or flawed yet continue to be used, inflating performance claims and obscuring safety concerns. This paper introduces criteria and a framework for deprecating inadequate benchmarks, with recommendations for developers, policymakers, and governance actors on how to maintain rigorous evaluation standards.
Risk Sources and Risk Management Measures in Support of Standards for General-Purpose AI Systems

October 30, 2024

A comprehensive, public domain catalog of AI risks and safety measures designed to support global AI regulation and standards development. This resource documents risk sources and management measures across the entire AI lifecycle, from development through deployment.

Deprecating Benchmarks: Criteria and Framework