In what year will Al achieve 95% or higher score on the Humanity’s Last Exam benchmark?
➕
Plus
5
Ṁ445
2035
3%
2025
5%
2026
21%
2027
20%
2028
23%
2029
28%
2030-2034

Background

Humanity's Last Exam (HLE) is a benchmark designed to evaluate Al systems' reasoning and problem-solving capabilities across a wide range of academic disciplines, including mathematics, humanities, and natural sciences. Developed collaboratively by the Center for Al Safety and Scale Al, HLE comprises 3,000 unambiguous and verifiable academic questions contributed by nearly 1,000 subject-matter experts from over 500 institutions across 50 countries. The dataset is multimodal, with approximately 10% of the questions requiring both image and text comprehension, while the remaining 90% are text-based.

As of early 2025, state-of-the-art Al models have demonstrated limited success on the HLE benchmark. For instance, OpenAl's 03-mini (high) model achieved an accuracy of 13% when evaluated solely on text-based questions. OpenAl's Deep Research agent, which leverages the 03 model for extensive web browsing and data analysis, reached an accuracy of 26.6% on the HLE benchmark.

Resolution Criteria

This question resolves to YES if a fully automated Al system achieves an average accuracy score of 95% or higher on the Humanity's Last Exam.

• Verification: The score must be verified by credible sources such as peer-reviewed research papers, arXiv preprints, or independent evaluations from reputable Al research institutions.

• Autonomy: The Al must solve problems without any human intervention, external assistance, or reliance on pre-existing solution datasets.

• Compute Resources: There is no limitation on computational resources; Al systems can utilize unlimited resources to attempt solutions.

Fine Print:

  • If the resolution criteria are unsatisfied by Jan 1, 2035 the market resolves to “Not Applicable.”

Get
Ṁ1,000
and
S3.00
Sort by:
boughtṀ10 YES

@Bayesian are you aware that the earliest date wins?
if the problem is solved in 2025 then only the first date 2026 wins. all other later dates fail to win. That was stated in the resolution criteria.

If you are aware of that and made your choice based on it then it is fine.

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules