AI safety standards worldwide must keep up with the rapid development and deployment of AI technology. Our mission is to help accelerate the writing of AI safety standards.

Featured works

  • Deprecating Benchmarks: Criteria and Framework

    As AI models rapidly advance, many benchmarks become outdated or flawed yet continue to be used, inflating performance claims and obscuring safety concerns. This paper introduces criteria and a framework for deprecating inadequate benchmarks, with recommendations for developers, policymakers, and governance actors on how to maintain rigorous evaluation standards.


  • Risk Sources and Risk Management Measures in Support of Standards for General-Purpose AI Systems

    A comprehensive, public domain catalog of AI risks and safety measures designed to support global AI regulation and standards development. This resource documents risk sources and management measures across the entire AI lifecycle, from development through deployment.


Recent outputs & updates

  • Our Input to the European Commission on the Future of European Standardisation

    We provided input to the European Commission consultation on the future of European Standardisation. This includes a detailed proposal for a new, more efficient, and more inclusive process to be used for writing harmonized standards to support digital and green legislation.


  • We presented a poster at the AI and Societal Robustness Conference

    Rokas Gipiškis (AI Standards Lab) and Rebecca Scholefield presented the poster “AI Incident Reporting: Pipeline and Principles” at the AI and Societal Robustness Conference in Cambridge, organised by the UK AI Forum. The work examines post-deployment AI incidents through an end-to-end pipeline spanning definitions and taxonomies, monitoring, reporting, and downstream analysis (including multi-causal approaches and…


  • Agentic Product Maturity Ladder V0.1

    MLCommons releases the Agentic Product Maturity Ladder V0.1, a systematic framework defining six progressive maturity levels (R0–R5) for benchmarking AI agent reliability. Initial assessment of four task domains shows no agents yet meet thresholds for product-level capability benchmarking.


  • Our Input to the European Commission on the Reporting of Serious AI Incidents

    We provided feedback to the European Commission’s consultation on Article 73 of the AI Act concerning serious incident reporting for high-risk AI systems. Our submission addressed definitional clarity, practical implementation challenges, identified edge scenarios and coordination between overlapping EU reporting frameworks.


  • Our Input to the European Commission on the Digital Simplification Package and Omnibus

    We provided input to the European Commission’s Digital Omnibus consultation on the EU AI Act. Based on our Code of Practice and CEN-CENELEC participation, we addressed high-risk AI classification issues, GPAI provider obligations, and standards development delays, recommending grace periods for smaller entities and refined classification criteria.