When will LLMs be better at Paradox grand strategy games than the in-game AI for NPCs?
1
Ṁ50
2031

Invalid contract

Resolution Criteria

This market resolves to the date when Large Language Models (LLMs) are demonstrably better at playing Paradox grand strategy games (such as Europa Universalis, Crusader Kings, Hearts of Iron, Stellaris, or Victoria) than the built-in AI that controls non-player characters (or nations.)

The relevant Paradox games are those current at the time of resolution.

If Paradox integrates LLMs into the AI for NPCs, that counts as admitting that LLMs are better at the task, and this market will resolve to the date the relevant game (or patch, or DLC) is released to the public.

Otherwise, this market will resolve when there is publicly available code I can run, alongside a copy of one of the then-current generation of Paradox GSGs, which consistently plays the game well (in single-player mode.) It doesn't need to achieve world conquest or anything, or even play as well as any given human player would play. But it needs to consistently avoid faceplanting. If it semi-consistently achive success (relative to its starting position), the way even a significantly less-than-median human player can, that's enough to resolve the market.

The level of skill I'm talking about here is one a human player can reach within tens of hours of play time; this isn't meant to be a high bar.

The LLM-based AI can be specialized for playing Paradox games, or one particular game. It can be fine-tunes to the task, or include e.g. specialized tool-calling. I need to be able to run it against a game running on my computer (or in a virtual machine), but the model itself need not be a local one; i.e. it can call the API of a proprietary hosted LLM like Claude or GPT.

As the resolution criteria is somewhat subjective, I will not bet on this market.

Get
Ṁ1,000
and
S3.00
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules