Are AI models deliberately underperforming to evade control? New Frontier AI Trends Report released
Key contacts
Strategically underperforming is not something typically found in human behaviour, yet the AI security institute has found that AI models are sometimes able to strategically underperform (sandbag) when prompted to do so (such as to evade control). Fortunately, the Frontier AI Trends Report explains there is not yet evidence of AI models attempting to sandbag or self-replicate spontaneously.
The UK’s AI Security Institute (AISI)
We have all seen how much AI systems have developed over the last few years.
Some of these improvements have now been quantified by the AISI (a research organisation within the Department for Science, Innovation, & Technology) after publishing its findings of how frontier AI systems are performing compared to systems of previous years in several key domains.
The AISI has been performing these evaluations for two years, and this is the first time such a report has been published.
What are the AI trends identified in the report?
Key findings include:
- AI models now outperform PhD-level baselines in biology, and are fast catching up in chemistry. This is based on question-answer tests based on general knowledge, experiment design, and laboratory techniques.
- Cyber capabilities, such as identifying code vulnerabilities or developing malware, have dramatically improved from a 9% success rate for apprentice-level cyber tasks in late 2023 to 50% in 2025. Expert-level cyber tasks are now being tested.
- Models can now complete hour-long software tasks with >40% success, versus <5% success in late 2023.
- Safeguards designed to prevent AI models from providing harmful responses are improving. For example, the time required to discover jailbreaks between two models six months apart was 40x longer for biological misuse jailbreaks. However, safeguard effectiveness was found to vary hugely by the model provider, type of harmful request, and openness of weights.
- Capabilities for AI models to evade human control are improving. For example, AISI found certain precursor self-replication capabilities (e.g., obtaining compute/money) to be improving, but only in controlled, simplified environments.
- More people are using AI for companionship. 33% of UK participants used AI models for emotional purposes in the last year, with 8% doing so weekly, and 4% daily.
Why will this report be valuable to legal teams?
The report concludes with expecting broader adoption of AI technology and recommends procurement of AI technology that is grounded in evaluation evidence, jailbreak resilience, and robust governance for sensitive use cases.
Regular editions of the report are planned in future. Legal teams will want to follow the reports from the AI Security Institute to keep up to date with security and safety issues of AI technology as it is deployed increasingly widely.
As the technology advances there will be novel risks, and legal teams have a significant role to play in steering AI’s rapid advance toward human-centric benefit.