Every month for around the last 10 years, Jane Street (a trading firm) has released a difficult puzzle on their website: https://www.janestreet.com/puzzles/archive/.
Right now, the best publicly accessible AI (GPT-4 or Gemini Ultra) is not very good at this. I tried running the February puzzle through both and GPT-4 gave a few definitions and then said it was complex (though it did correctly simulate it afterward, even though the problem asked for an exact answer), and Gemini Ultra wasn't even close.
During which year will a publicly accessible AI be able to solve at least 6 of the 12 puzzles released during the year? (Resolves yes during each year this happens. Multiple years can resolve YES)
Clarifications
Must be a general-purpose AI model, not AlphaGeometry or something
Publicly accessible = reasonably accessible by an average interested member of the public
Puzzles must be solved with minimal human input, aside from maybe "Let's think step by step" or something. I want to basically just copy-paste the puzzle and have it give a solution.
The model is not allowed to search for the solution or copy from a similar puzzle, it must clearly be solving the puzzle.
Different AIs can solve different puzzles, as long as they are released before the end of the month of the puzzle they are solving and are still general-purpose. (If GPT-5 can solve all the puzzles and is released in October of this year, it can't retroactively count for the earlier puzzles)
Resolves N/A if the puzzles stop being published.
IMO this would require some really high quality planning & CoT architecture that doesn’t seem achievable for a general public AI model in the next 1-2 years. E.g. the Feb 2025 puzzle requires (1) a hypothesis about how the features of the puzzle are related, (2) a lot of exploration to find “the trick” of connecting the clues to the answers, and (3) an intuition for how to stitch the answers together to derive the final puzzle answer. Right now LLM/transformer-based models just don’t seem to have the creative knack to solve more than half of these kinds of problems. Could be wrong.
@pricemaker It's tough, but I do think the advent of reasoning models in 2024 helped the models go from "completely hopeless" to "making some genuine attempts". So who knows what the next generation will be able to do.
@ZoravurSingh This is unlinked MC, so if you don’t think it will happen before 2028 you can bet NO on the pre-2028 options. I didn’t want to add too many years because I think it’s more difficult to have a good prediction the farther out you go
@ahalekelly Sure. It can't be a back and forth thing with the human, but it can use Code Interpreter. I wouldn't expect it to help a ton though, I think the puzzles are mostly not easily brute forceable in that way