Will the ARC Prize Foundation succeed at making a new benchmark that is easy for humans but still hard for the best AIs?
➕
Plus
15
Ṁ1196
2026
74%
chance

https://techcrunch.com/2025/01/08/ai-researcher-francois-chollet-is-co-founding-a-nonprofit-to-build-benchmarks-for-agi/

Specifically, this resolves YES if:

(1) A new benchmark is announced before the end of 2025; and

(2) The best AI result published within three months after the announcement is less than half of the human-level target. (For example, if human-level performance is claimed to be 80%, an AI will need to reach at least 40%.)

If multiple new benchmarks are created in 2025, this will resolve YES if condition 2 is true for any of them.

  • Update 2025-09-01 (PST) (AI summary of creator comment): Resolution Criteria Update:

    • CPU or compute cost caps will be ignored when evaluating the AI performance.

Get
Ṁ1,000
and
S3.00
Sort by:

Just fyi: it is very easy to find tasks that make current llms fail. In particular, if you give them two very big almost identical texts with 5 changes, they will fail to identify the changes.

Even worse is visual reasoning due to limited data and transformer unfriendly formats.

@mathvc o3 mostly succeeded at visual reasoning for the original ARC-AGI benchmark though. I'm curious how much harder they can make it while still keeping it easy for humans to solve.

@TimothyJohnson5c16 ARC is special kind of visual reasoning (discrete 2D grid). There are many visual tasks reasoning beyond that

ARC-AGI-1 had a $10,000 cap on compute cost. If ARC-AGI-2 has a similar cap, but a system exceeds the half-of-human target by spending more than the cap, does that still resolve YES?

@Nick6d8e Hmm, good question. I'm interested in comparing with o3's performance on ARC-AGI-1, and I understand they spent up to $1,000 per question, so I think I'll ignore the CPU cap.

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules