Close Menu
TechUpdateAlert

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    My Health Anxiety Means I Won’t Use Apple’s or Samsung’s Smartwatches. Here’s Why

    December 22, 2025

    You can now buy the OnePlus 15 in the US and score free earbuds if you hurry

    December 22, 2025

    Today’s NYT Connections: Sports Edition Hints, Answers for Dec. 22 #455

    December 22, 2025
    Facebook X (Twitter) Instagram
    Trending
    • My Health Anxiety Means I Won’t Use Apple’s or Samsung’s Smartwatches. Here’s Why
    • You can now buy the OnePlus 15 in the US and score free earbuds if you hurry
    • Today’s NYT Connections: Sports Edition Hints, Answers for Dec. 22 #455
    • Android might finally stop making you tap twice for Wi-Fi
    • Today’s NYT Mini Crossword Answers for Dec. 22
    • Waymo’s robotaxis didn’t know what to do when a city’s traffic lights failed
    • Today’s NYT Wordle Hints, Answer and Help for Dec. 22 #1647
    • You Asked: OLED Sunlight, VHS on 4K TVs, and HDMI Control Issues
    Facebook X (Twitter) Instagram Pinterest Vimeo
    TechUpdateAlertTechUpdateAlert
    • Home
    • Gaming
    • Laptops
    • Mobile
    • Software
    • Reviews
    • AI & Tech
    • Gadgets
    • How-To
    TechUpdateAlert
    Home»Mobile»AI Sucks at Sudoku, but Its Explanations Are Even Worse. Why That’s Worrisome
    Mobile

    AI Sucks at Sudoku, but Its Explanations Are Even Worse. Why That’s Worrisome

    techupdateadminBy techupdateadminDecember 4, 2025No Comments8 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    A close up of a hand with a pen filling in a Sudoku puzzle
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Chatbots are genuinely impressive when you watch them do things they’re good at, like writing a basic email or creating weird, futuristic-looking images. But ask generative AI to solve one of those puzzles in the back of a newspaper, and things can quickly go off the rails.

    That’s what researchers at the University of Colorado at Boulder found when they challenged large language models to solve sudoku. And not even the standard 9×9 puzzles. An easier 6×6 puzzle was often beyond the capabilities of an LLM without outside help (in this case, specific puzzle-solving tools).

    AI Atlas badge tag

    A more important finding came when the models were asked to show their work. For the most part, they couldn’t. Sometimes they lied. Sometimes they explained things in ways that made no sense. Sometimes they hallucinated and started talking about the weather.

    If gen AI tools can’t explain their decisions accurately or transparently, that should cause us to be cautious as we give these things more control over our lives and decisions, said Ashutosh Trivedi, a computer science professor at the University of Colorado at Boulder and one of the authors of the paper published in July in the Findings of the Association for Computational Linguistics.

    “We would really like those explanations to be transparent and be reflective of why AI made that decision, and not AI trying to manipulate the human by providing an explanation that a human might like,” Trivedi said.


    Don’t miss any of our unbiased tech content and lab-based reviews. Add CNET as a preferred Google source.


    The paper is part of a growing body of research into the behavior of large language models. Other recent studies have found, for example, that models hallucinate in part because their training procedures incentivize them to produce results a user will like, rather than what is accurate, or that people who use LLMs to help them write essays are less likely to remember what they wrote. As gen AI becomes more and more a part of our daily lives, the implications of how this technology works and how we behave when using it become hugely important.

    When you make a decision, you can try to justify it or at least explain how you arrived at it. An AI model may not be able to accurately or transparently do the same. Would you trust it?

    Watch this: I Built an AI PC From Scratch

    04:07

    Why LLMs struggle with sudoku

    We’ve seen AI models fail at basic games and puzzles before. OpenAI’s ChatGPT (among others) has been totally crushed at chess by the computer opponent in a 1979 Atari game. A recent research paper from Apple found that models can struggle with other puzzles, like the Tower of Hanoi.

    It has to do with the way LLMs work and fill in gaps in information. These models try to complete those gaps based on what happens in similar cases in their training data or other things they’ve seen in the past. With a sudoku, the question is one of logic. The AI might try to fill each gap in order, based on what seems like a reasonable answer, but to solve it properly, it instead has to look at the entire picture and find a logical order that changes from puzzle to puzzle. 

    Read more: 29 Ways You Can Make Gen AI Work for You, According to Our Experts

    Chatbots are bad at chess for a similar reason. They find logical next moves but don’t necessarily think three, four or five moves ahead — the fundamental skill needed to play chess well. Chatbots also sometimes tend to move chess pieces in ways that don’t really follow the rules or put pieces in meaningless jeopardy. 

    You might expect LLMs to be able to solve sudoku because they’re computers and the puzzle consists of numbers, but the puzzles themselves are not really mathematical; they’re symbolic. “Sudoku is famous for being a puzzle with numbers that could be done with anything that is not numbers,” said Fabio Somenzi, a professor at CU and one of the research paper’s authors.

    I used a sample prompt from the researchers’ paper and gave it to ChatGPT. The tool showed its work and repeatedly told me it had the answer before showing a puzzle that didn’t work, then going back and correcting it. It was like the bot was turning in a presentation that kept getting last-second edits: This is the final answer. No, actually, never mind, this is the final answer. It got the answer eventually, through trial and error. But trial and error isn’t a practical way for a person to solve a sudoku in the newspaper. That’s way too much erasing and ruins the fun.

    A robot plays chess against a person.

    AI and robots can be good at games if they’re built to play them, but general-purpose tools like large language models can struggle with logic puzzles.

    Ore Huiying/Bloomberg/Getty Images

    AI struggles to show its work

    The Colorado researchers didn’t just want to see if the bots could solve puzzles. They asked for explanations of how the bots worked through them. Things did not go well.

    Testing OpenAI’s o1-preview reasoning model, the researchers saw that the explanations — even for correctly solved puzzles — didn’t accurately explain or justify their moves and got basic terms wrong. 

    “One thing they’re good at is providing explanations that seem reasonable,” said Maria Pacheco, an assistant professor of computer science at CU. “They align to humans, so they learn to speak like we like it, but whether they’re faithful to what the actual steps need to be to solve the thing is where we’re struggling a little bit.”

    Sometimes, the explanations were completely irrelevant. Since the paper’s work was finished, the researchers have continued to test new models released. Somenzi said that when he and Trivedi were running OpenAI’s o4 reasoning model through the same tests, at one point, it seemed to give up entirely. 

    “The next question that we asked, the answer was the weather forecast for Denver,” he said.

    (Disclosure: Ziff Davis, CNET’s parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.)

    Better models are still bad at what matters

    The researchers at Colorado aren’t the only ones challenging language models with sudoku. Sakana AI has been testing how effective different models have been at solving the puzzles since May. Its leaderboard shows that newer models, particularly OpenAI’s GPT-5, have much better solve rates than their predecessors. GPT-5 was the first in these tests to solve a 9×9 modern sudoku problem variant called Theta. Still, LLMs struggle with actual reasoning, as opposed to computational problem-solving, the Sakana researchers wrote in a blog post. “While GPT-5 demonstrated impressive mathematical reasoning capabilities and human-like strategic thinking on algebraically-constrained puzzles, it struggled significantly with spatial reasoning challenges that require spatial understanding,” they wrote. 

    The Colorado research team also found that GPT-5 was a “significant step forward” but is still not very good at solving sudoku. GPT-5 is still bad at explaining how it came to a solution, they said. In one test, the Colorado team found the model explained that it placed a number in the puzzle that was already in the puzzle as a given. 

    “Overall, our conclusions from the original study remain essentially unchanged: there has been progress in raw solving ability, but not yet in trustworthy, step-by-step explanations,” the Colorado team said in an email.

    Explaining yourself is an important skill

    When you solve a puzzle, you’re almost certainly able to walk someone else through your thinking. The fact that these LLMs failed so spectacularly at that basic job isn’t a trivial problem. With AI companies constantly talking about “AI agents” that can take actions on your behalf, being able to explain yourself is essential.

    Consider the types of jobs being given to AI now, or planned for in the near future: driving, doing taxes, deciding business strategies and translating important documents. Imagine what would happen if you, a person, did one of those things and something went wrong.

    “When humans have to put their face in front of their decisions, they better be able to explain what led to that decision,” Somenzi said.

    It isn’t just a matter of getting a reasonable-sounding answer. It needs to be accurate. One day, an AI’s explanation of itself might have to hold up in court, but how can its testimony be taken seriously if it’s known to lie? You wouldn’t trust a person who failed to explain themselves, and you also wouldn’t trust someone you found was saying what you wanted to hear instead of the truth. 

    “Having an explanation is very close to manipulation if it is done for the wrong reason,” Trivedi said. “We have to be very careful with respect to the transparency of these explanations.”

    Explanations Sucks Sudoku Worrisome worse
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleAntigravity A1 Review: A 360-Degree Drone
    Next Article Jon M. Chu Says AI Couldn’t Have Made One of Wicked’s Best Moments
    techupdateadmin
    • Website

    Related Posts

    Mobile

    Today’s NYT Wordle Hints, Answer and Help for Dec. 21 #1646

    December 21, 2025
    Mobile

    OnePlus 15T’s specs tipped – GSMArena.com news

    December 21, 2025
    Mobile

    TikTok is not getting banned in the US, after all

    December 21, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    NYT Strands hints and answers for Monday, August 11 (game #526)

    August 11, 202545 Views

    These 2 Cities Are Pushing Back on Data Centers. Here’s What They’re Worried About

    September 13, 202542 Views

    Today’s NYT Connections: Sports Edition Hints, Answers for Sept. 4 #346

    September 4, 202540 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    Best Fitbit fitness trackers and watches in 2025

    July 9, 20250 Views

    There are still 200+ Prime Day 2025 deals you can get

    July 9, 20250 Views

    The best earbuds we’ve tested for 2025

    July 9, 20250 Views
    Our Picks

    My Health Anxiety Means I Won’t Use Apple’s or Samsung’s Smartwatches. Here’s Why

    December 22, 2025

    You can now buy the OnePlus 15 in the US and score free earbuds if you hurry

    December 22, 2025

    Today’s NYT Connections: Sports Edition Hints, Answers for Dec. 22 #455

    December 22, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact us
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    © 2026 techupdatealert. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.