AI produces Connections puzzles that rival human-created ones


Can artificial intelligence (AI) create word puzzles as engaging and challenging as those crafted by human experts?

A new study suggests the answer may be yes — at least when it comes to The New York Times' popular Connections game.

Researchers from NYU Tandon School of Engineering and Jester Labs have developed an AI system capable of generating novel Connections puzzles that often rival those created by Times puzzle designers. 

In a user study, participants played both AI-generated and official Times puzzles without knowing their source. In roughly half of head-to-head comparisons, players judged the AI puzzles to be equally or more enjoyable, creative, and difficult than their human-created counterparts.

Their findings shed light on the creative capabilities of large language models like GPT-4.

Connections, which debuted in June 2023, challenges players to sort 16 words into four thematically linked groups of four. The game quickly became one of the Times' most popular online offerings, second only to Wordle, with billions of plays per year.

To create AI-generated puzzles, the researchers employed an "agentic workflow" approach. This method involves using GPT-4 in multiple specialized roles throughout the puzzle creation process. 

Rather than asking the AI to generate an entire puzzle at once, researchers broke down the task into smaller, more focused steps. For each step, they prompted GPT-4 with specific instructions, effectively having it play different roles such as puzzle creator, editor, and difficulty assessor. 

This approach allowed the team to leverage the AI's capabilities more effectively by guiding it through a process that mimics how human designers might approach puzzle creation.

"We found that solving a complex problem like generating a Connections puzzle requires more than just asking an AI to do it," said Timothy Merino, a Ph.D. student in NYU Tandon’s Game Innovation Lab who is the lead author of the study. "By breaking the task into smaller, more manageable steps and using the LLM as a tool in various ways, we achieved better results."

The paper’s senior author, Julian Togelius — NYU Tandon associate professor of computer science and engineering, and the Director of the Game Innovation Lab — emphasized the importance of this approach. "The LLM is crucial to our system, but it's not in the driving seat. We use it in different parts of the system for specific tasks, like asking for the best concept that would apply to a particular list of words."

The researchers also identified two key ways puzzles introduce difficulty: "Intentional overlap" and "False groups." They analyzed word similarity in relation to difficulty levels, finding that easier word groups tend to have more similar words, while trickier groups have less similar words.

“I was consistently surprised at how good GPT was at creating a clever word group,” said Merino. “One of my favorites the AI generated is ‘Beatles Album Words’: ‘Abbey’, ‘Mystery,’  "Pepper,’ and ‘White.’”

The research has implications beyond word games, according to the researchers. It is a step toward better understanding both AI capabilities and human creativity.

"This work isn't just about generating puzzles," Togelius said. "It's about using AI to test and refine our theories of what makes a good puzzle in the first place. Connections is a worthy area of research because what makes a good game isn’t easy to define. We can refine our understanding of game design by creating theories of what makes for good games, implement them into algorithms, and see whether the games that are generated by the algorithms are actually good."

This recent paper builds upon the Game Innovation Lab's ongoing research into AI and Connections. In a study published earlier this year, the lab's researchers evaluated various AI models' proficiency in solving Connections puzzles. Their findings revealed that while GPT-4 outperformed other models, it still fell short of mastering the game, successfully solving only about 29 percent of the puzzles presented.


Merino, Tim & Earle, Sam & Sudhakaran, Ryan & Sudhakaran, Shyam & Togelius, Julian. (2024). Making New Connections: LLMs as Puzzle Generators for The New York Times' Connections Word Game.