Will scaling lead to artificial general intelligence?
12
Ṁ226
2030
72%
chance

Will any plain transformer model achieve 60% or more on ARC-AGI-2 by 2030?

The inference cost to achieve this result does not matter.

The model that achieves this result must use the same "transformer recipe" common between 2023-2025: techniques like RLHF/RLAIF/CoT/RAG/vision encoders are allowed, but any specialized components must also be made of vanilla transformer blocks; Any new inductive biases, such as tree-search, neurosymbolic logic, etc. would not qualify.

The result must be verified by at least one reputable, unaffiliated org (ARC, Epoch, OpenAI Evals, academic lab, etc.) or a publicly re-runnable result (notebook on Kaggle, etc.).

Resolution uses the ARC-AGI-2 evaluation set and scoring script as published on arcprize.org on the day this market opens. Later revisions are ignored.

Get
Ṁ1,000
and
S3.00
Sort by:

Honestly I don't believe 60% or more on ARC-AGI-2 is truly AGI in any meaningful sense:

Humans can score 100%, not 60.

It's a single benchmark that doesn't really test the full breadth of capabilities. It's definitely possible to have a system that's good at this benchmark while being useless in other tasks.

I propose renaming the question.

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules