Will a large language model beat a super grandmaster playing chess by 2028?

Plus

1.8k

Ṁ1.1m

2029

52%

chance

ALL

If a large language models beats a super grandmaster (Classic elo of above 2,700) while playing blind chess by 2028, this market resolves to YES.

I will ignore fun games, at my discretion. (Say a game where Hiraku loses to ChatGPT because he played the Bongcloud)

Some clarification (28th Mar 2023): This market grew fast with a unclear description. My idea is to check whether a general intelligence can play chess, without being created specifically for doing so (like humans aren't chess playing machines). Some previous comments I did.

1- To decide whether a given program is a LLM, I'll rely in the media and the nomenclature the creators give to it. If they choose to call it a LLM or some term that is related, I'll consider. Alternatively, a model that markets itself as a chess engine (or is called as such by the mainstream media) is unlikely to be qualified as a large language model.

2- The model can write as much as it want to reason about the best move. But it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database.

I won't bet on this market and I will refund anyone who feels betrayed by this new description and had open bets by 28th Mar 2023. This market will require judgement.

Update 2025-21-01 (PST) (AI summary of creator comment): - LLM identification: A program must be recognized by reputable media outlets (e.g., The Verge) as a Large Language Model (LLM) to qualify for this market.
- Self-designation insufficient: Simply labeling a program as an LLM without external media recognition does not qualify it as an LLM for resolution purposes.

Update 2025-06-14 (PST) (AI summary of creator comment): The creator has clarified their definition of "blind chess". The game must be played with the grandmaster and the LLM communicating their respective moves using standard notation.

This question is managed and resolved by Manifold.

AI Capabilities

#️ Technology

#AI

#Technical AI Timelines

#️ Chess

Get

1,000

and

3.00

25 Comments

1.6k Holders

6.3k Trades

Sort by:

Let me get this straight.

It is July in the year of our lord 2025. Almost 3y after GPT-3 burst on to the scene, almost 1y after "reasoning" models came out. And currently, the best models still make ILLEGAL chess moves in the midgame.

And we are saying >50% chance an LLM BEATS a >2700 player?

Do people not realize how irrelevant the blindfold is? A 2000 ELO player is better than probably anyone you've ever met. Yet here is an IM (not even a GM, let alone >2700) beating the 2000 player while blindfolded: https://www.youtube.com/watch?v=4VVGlmtfEYw

What's the thinking here? People don't know that GMs see the whole board in their heads? People think the AI messiah will arrive in the next 3y (it's always 3y out), and also the AI messiah will be an LLM?

@pietrokc As LLMs have developed over the past few years they have developed abilities non-linearly/emergently. Where previous models might have failed at a simple task 100% of the time, the next model can suddenly complete the task 99% or 100% of the time. It’s very difficult to predict what the next generation of AI models will be capable of.

I’m not saying I agree with the current market probability btw.

@DylanSlagh But this market is not about "AI". It is about LLMs, which I take to mean a big neural net that assigns probabilities to the next token.

We already have "AI" that can run on a phone and beats any human, but it's not an LLM, because an LLM is close to the worst imaginable architecture for a chess program.

Of course I agree the capabilities demonstrated by LLMs over the past 3y are very surprising, especially in fuzzy domains. But it has been pretty easy to predict their performance in mathematical / algorithmic domains -- they are uniformly very brittle, very stiff, and are no good in flexible scenarios even a little outside the training distribution.

"If a large language models beats a super grandmaster (Classic elo of above 2,700) while playing blind chess by 2028, this market resolves to YES."

Hold on, I interepreted this as:
"If a large language models beats a super grandmaster while [the large language model is] playing blind chess by 2028, this market resolves to YES."

In other words, the LLM just plays by putting moves in standard chess notation into its chat window, there is not extra scaffolding created for the LLM. That's how LLMs currently play chess right now most often, so this was a natural assumption. I think that making the grandmaster play blind is just a weird restriction.

The AI overview for this question said:
"The creator has clarified their definition of "blind chess". The game must be played with the grandmaster and the LLM communicating their respective moves using standard notation."

So I took the definition of "blind chess" to just mean that communication must be through standard notation, but not that the grandmaster needs to be playing "blindfolded."

@SorenJ No, it’s not just the LLM, and the AI summary of OP’s comments above is a bad one.

@JimHays Thanks, I see. This market is mostly about whether or not such a match happens then (which seems quite unlikely.)

Wait, why does the description say blind chess? That's much harder for a human. The title just says chess.

bought Ṁ1,000 NO

@IsaacKing I agree that this is misleading. A super GM blind is still GM-level, though

@IsaacKing it's a language model.

Notation is sufficient for a (as small as 50M parameter) LLM have a perfect world model of the chess board (per Adam Karvonen: https://adamkarvonen.github.io/machine_learning/2024/01/03/chess-world-models.htm )

I doubt vision or some workaround like printing the board state helps since PGN notation is structed data which allows for more (narrow) intelligence about chess (and the way grandmasters/engines play) vs unstructured multimodal data would require the model to attend to lots of unrelated data worsening performance on the specialized task of finding the best move

@ChinmayTheMathGuy the question is why the human super GM is not given access to the board

@LuisPedroCoelho oh I misinterpreted.

up to the creator. That's the difference between 2500 and 2700. I guess they're trying to level the playing field. 200 rating points is the difference between 24% and 50% win probability, so maybe it shifts this markets odds by 1 or 2% (because I'm guessing multiple super GMs would play multiple games assuming it's good (>2200) and can explain it's moves)

for me, it's like 1000 rating points. I got castle checkmated by the aforementioned LLM chess bot because I played it on a bad gui in pygame (letters for pieces, no flip board, was hard for me to visualize and I didn't think too much)

@ChinmayTheMathGuy The way it impacts this market is less about the impact on the skill of the GM, and more about the likelihood of this particular scenario happening

@JimHays that is a very good point too

I can definitely see a situation where an LLM is so good that it beats super GMs in normal play and then nobody would take it on blind because there would be no point in doing so. In that case, I think the market would awkwardly have to resolve NO as per the repeated insistence that the super GM should be playing blind

Even more likely is the situation where the GM is streaming/voding the game and in order to assist the audience on what is going on, opens a board for them, but in doing so is technically no longer playing blind.

A well overdue correction. In my humble opinion as a machine learning student, this should be sitting at 25% or less. We are just nowhere near something like this happening, and short of a major paradigm shift, there is zero indication that we are even on track for it. In my opinion, two things would need to happen by 2028: 1) we get AGI or something close to it; 2) this AGI is an LLM. Neither seem very likely to me, and the combination of both seems out of the question.

Hedge:

Here's a sort of derivative of this question. If LLM's can beat super GMs by 2028, by when would you expect them to beat 2000 ELO?

I also like the implicit definition of LLM in the question below - whatever's top 3 on lmsys - that's much better!

Eh, there is work to do, still.

https://newatlas.com/computers/chatgpt-takes-1977-atari-didnt-go-well

ChatGPT takes on a 1977 Atari at chess ... and it didn't go well

Discover how a 1977 Atari 2600 chess engine outsmarted ChatGPT, highlighting the limitations of AI in strategic games like chess.

bought Ṁ50 NO from 59% to 58%

@JussiVilleHeiskanen that article only uses 4o it doesn't even uses the reasoning models like O3

Dumb question: is the LLM allowed to write and execute python code? (As long as it doesn't use a chess library)

@DavidFWatson The answer is clearly in the description

@BrunoJ where

@OscarGarciaAps5

2- The model can write as much as it want to reason about the best move. But it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database.

Im glad people with more liquidity are finally here to drop the chance to something reasonable

Is the engine allowed to use a grammar of just valid moves?

Related questions

Related questions