Invalid contract
While OpenAI has claimed that o3-mini achieved 32% on FrontierMath, I don't really believe them, plus they used an ungodly amount of compute.
When judging how much progress has been made on FrontierMath, I prefer to defer to Epoch. The highest Epoch-validated FrontierMath score is o3-mini-high, with 11%.
At end-of-year 2026, what will be the highest performance on FrontierMath, according to Epoch? To resolve this, I will use their AI Benchmarking Hub, or -- if that page becomes out of date -- whatever I consider the authoritative Epoch source on FrontierMath to be.
It seems plausible that Epoch will give different numbers depending on amount of compute, scaffolding, etc. If so, I will resolve this to the highest number claimed by Epoch -- though note that a number only counts if it was validated by Epoch. If Epoch lists self-reported numbers from a lab that it has not validated, then those numbers do not count for the resolution of this market.