Home » Challenging the Notion That LLMs Can’t Reason: A Case Study with Einstein’s Puzzle

Challenging the Notion That LLMs Can’t Reason: A Case Study with Einstein’s Puzzle

When Apple declared that LLMs can't reason, they forgot one crucial detail: a hammer isn't meant to turn screws. In our groundbreaking study of Einstein's classic logic puzzle, we discovered something fascinating. While language models initially stumbled with pure reasoning - making amusing claims like "Plumbers don't drive Porsches" - they excelled at an unexpected task.

By Oliver King-Smith, CEO and founder smartR AI
Last Updated: November 10, 2024

Introduction to LLMs and the Reasoning Debate

A recent Apple publication argued that Large Language Models (LLMs) cannot effectively reason. While there is some merit to this claim regarding out-of-the-box performance, this article demonstrates that with proper application, LLMs can indeed solve complex reasoning problems.

The Initial Experiment: Einstein’s Puzzle

We set out to test LLM reasoning capabilities using Einstein’s puzzle, a complex logic problem involving 5 houses with different characteristics and 15 clues to determine who owns a fish. Our initial tests with leading LLMs showed mixed results:

OpenAI’s model correctly guessed the answer, but without clear reasoning
Claude provided an incorrect answer
When we modified the puzzle with new elements (cars, hobbies, drinks, colors, and jobs), both models failed significantly

Tree of Thoughts Approach and Its Challenges

We implemented our Tree of Thoughts approach, where the model would:

Make guesses about house arrangements
Use critics to evaluate rule violations
Feed this information back for the next round

However, this revealed several interesting failures in reasoning:

Logic Interpretation Issues

The critics often struggled with basic logical concepts. For example, when evaluating the rule “The Plumber lives next to the Pink house,” we received this confused response:

“The Plumber lives in House 2, which is also the Pink house. Since the Plumber lives in the Pink house, it means that the Plumber lives next to the Pink house, which is House 1 (Orange).”

Bias Interference

The models sometimes inserted unfounded biases into their reasoning. For instance:

“The Orange house cannot be in House 1 because the Plumber lives there and the Plumber does not drive a Porsche.”

The models also made assumptions about what music Porsche drivers would listen to, demonstrating how internal biases can interfere with pure logical reasoning.

A Solution Through Code Generation

While direct reasoning showed limitations, we discovered that LLMs could excel when used as code generators. We asked SCOTi to write MiniZinc code to solve the puzzle, resulting in a well-formed constraint programming solution. The key advantages of this approach were:

Each rule could be cleanly translated into code statements
The resulting code was highly readable
MiniZinc could solve the puzzle efficiently

Example of Clear Rule Translation

The MiniZinc code demonstrated elegant translation of puzzle rules into constraints. For instance:

% Statement 11: The man who enjoys Music lives next to the man who drives Porsche
% Note / means AND in minizinc
constraint exists(i,j in 1..5)(abs(i-j) == 1 / hobbies[i] = Music / cars[j] = Porsche);

If you would like to get the full MiniZinc code, please contact me.

Implications and Conclusions: Rethinking the Role of LLMs

This experiment reveals several important insights about LLM capabilities:

Direct reasoning with complex logic can be challenging for LLMs
Simple rule application works well, but performance degrades when multiple steps of inference are required
LLMs excel when used as agents to generate code for solving logical problems
The combination of LLM code generation and traditional constraint solving tools creates powerful solutions

The key takeaway is that while LLMs may struggle with certain types of direct reasoning, they can be incredibly effective when properly applied as components in a larger system. This represents a significant advancement in software development capabilities, demonstrating how LLMs can be transformative when used strategically rather than as standalone reasoning engines.

This study reinforces the view that LLMs are best understood as transformational software components rather than complete reasoning systems. Their impact on software development and problem-solving will continue to evolve as we better understand how to leverage their strengths while working around their limitations.

AI
Apple, Chatgpt, LLM, OpenAI

Oliver King-Smith, CEO and founder smartR AI

Oliver King-Smith is CEO of smartR AI, a company which a company which facilitates and empowers organizations to extract real value from their data in an ethical, responsible, and sustainable manner using cutting edge AI technology.

All Posts

Samsung Unveils Rugged Galaxy XCover7 Pro and Tab Active5 Pro for Field Teams

Tech News & Insight
April 15, 2025
Hema Kadia

Samsung has launched two new rugged devices—the Galaxy XCover7 Pro smartphone and the Tab Active5 Pro tablet—designed for high-intensity fieldwork in sectors like logistics, healthcare, and manufacturing. These devices offer military-grade durability, advanced 5G connectivity, and enterprise-ready security with Samsung Knox Vault. Features like hot-swappable batteries, gloved-touch sensitivity, and AI-powered tools enhance productivity and reliability in harsh environments.

AI, Devices
Devices, eSIM, Samsung, WiFi
Construction, HealthCare, Manufacturing, Public sector, Retail, Warehouse and Logistics

Private 5G for Events: Nokia, Digita, and CoreGo Join Forces

Tech News & Insight
April 15, 2025
Hema Kadia

Nokia, Digita, and CoreGo have partnered to roll out private 5G networks and edge computing solutions at high-traffic event venues. Using Nokia’s Digital Automation Cloud (DAC) and CoreGo’s payment and access tech, the trio delivers real-time data flow, reliable connectivity, and enhanced guest experience across Finland and international locations—serving over 2 million attendees to date.

OpenAI Explores Social App to Rival X and Meta

Tech News & Insight
April 15, 2025
Hema Kadia

OpenAI is developing a prototype social platform featuring an AI-powered content feed, potentially placing it in direct competition with Elon Musk’s X and Meta’s AI initiatives. Spearheaded by Sam Altman, the project aims to harness user-generated content and real-time interaction to train advanced AI systems—an approach already used by rivals like Grok and Llama.

AI
Chatgpt, Meta, OpenAI

AI Pulse: Telecom’s New Frontier

Article & Insights
April 17, 2025
Hema Kadia

AI Pulse: Telecom’s Next Frontier is a definitive guide to how AI is reshaping the telecom landscape — strategically, structurally, and commercially. Spanning over 130 pages, this MWC 2025 special edition explores AI’s growing maturity in telecom, offering a comprehensive look at the technologies and trends driving transformation.

Explore strategic AI pillars—from AI Ops and Edge AI to LLMs, AI-as-a-Service, and governance—and learn how telcos are building AI-native architectures and monetization models. Discover insights from 30+ global CxOs, unpacking shifts in leadership thinking around purpose, innovation, and competitive advantage.

The edition also examines connected industries at the intersection of Private 5G, AI, and Satellite—fueling transformation in smart manufacturing, mobility, fintech, ports, sports, and more. From fan engagement to digital finance, from smart cities to the industrial metaverse, this is the roadmap to telecom’s next era—where intelligence is the new infrastructure, and telcos become the enablers of everything connected.

5G, 6G, AI, API, AR, Automation, Edge/MEC, Monetization, Private Networks, Security, Sustainability, Telco Cloud
Agility Robotics, Airtel, CBRS, China Mobile, Cohere, Deutsche Telekom, DoT, Etisalat, Europe, FinTech, India, KDDI, LEO, LTE, Mistral AI, MTN, Orange, Policy, Private 5G, Robotic, Telefonica, Telenor, Telstra, Vodafone
Financials, Industrial Automation, Manufacturing, Ports, Sports & Events Venue, Transportation

AI in Telecom: Strategic Themes, Maturity, and the Road Ahead

Article & Insights
April 10, 2025
Hema Kadia

In AI in Telecom: Strategic Themes, Maturity, and the Road Ahead, we explore how AI has shifted from buzzword to backbone for global telecom leaders. From AI-native networks and edge inferencing, to domain-specific LLMs and behavioral cybersecurity, this article maps out the strategic pillars, real-world use cases, and monetization models driving the AI-powered telecom era. Featuring CxO insights from Telefónica, KDDI, MTN, Telstra, and Orange, it captures the voice of a sector transforming infrastructure into intelligence.

AI, Edge/MEC, Monetization, Network Infrastructure, Open RAN, OSS-BSS, RAN, Security
America, Customer Experience, Cybersecurity, Etisalat, Europe, GenAI, India, KDDI, LLM, MTN, MWC, Orange, Telefonica, Telenor, Telstra
Telecom

The Gateway to New Future: How Global Telco Leaders Are Shaping the Digital Future

Article & Insights
April 10, 2025
Hema Kadia

In The Gateway to a New Future, top global telecom leaders—Marc Murtra (Telefónica), Vicki Brady (Telstra), Sunil Bharti Mittal (Airtel), Biao He (China Mobile), and Benedicte Schilbred Fasmer (Telenor)—share bold visions for reshaping the industry. From digital sovereignty and regulatory reform in Europe, to AI-powered smart cities in China and fintech platforms in Africa, these executives reveal how telecom is evolving into a driving force of global innovation, inclusion, and collaboration. The telco of tomorrow is not just a network—it’s a platform for economic and societal transformation.

5G, 6G, AI, API, Edge/MEC, Private Networks, Security, Sustainability
Airtel, China Mobile, Cybersecurity, Europe, GSMA, India, MWC, Policy, Private 5G, Telefonica, Telenor, Telstra
Smart Cities, Telecom

Download Magazine

With Subscription

AI Pulse: Telecom’s New Frontier

Latest Videos

NTT DATA and Nokia Transform Brownsville into a Smart City with Private 5G

Challenging the Notion That LLMs Can’t Reason: A Case Study with Einstein’s Puzzle

Introduction to LLMs and the Reasoning Debate

The Initial Experiment: Einstein’s Puzzle

Tree of Thoughts Approach and Its Challenges

Logic Interpretation Issues

Bias Interference

A Solution Through Code Generation

Example of Clear Rule Translation

Implications and Conclusions: Rethinking the Role of LLMs

Oliver King-Smith, CEO and founder smartR AI

Recent Content

Samsung Unveils Rugged Galaxy XCover7 Pro and Tab Active5 Pro for Field Teams

Private 5G for Events: Nokia, Digita, and CoreGo Join Forces

OpenAI Explores Social App to Rival X and Meta

AI Pulse: Telecom’s New Frontier

AI in Telecom: Strategic Themes, Maturity, and the Road Ahead

The Gateway to New Future: How Global Telco Leaders Are Shaping the Digital Future

Download Magazine

AI Pulse: Telecom’s New Frontier

Sponsored Content

VoLTE Roaming: The Vulnerabilities and the Essential Protection Measures

Whitepaper

Unveiling GTPu’s Role in 5G Networks

Whitepaper

Subscribe To Our Newsletter

Latest Videos

NTT Data and Nokia: Driving Private Networks for Smart Cities

Private Networks for Mining: How Ericsson and Epiroc Lead the Way

How Ericsson’s Private 5G Transforms Smart Factory Operations

Private Networks for Post-Hurricane Recovery: A Case Study

Private Networks for Agriculture: Trilogy’s Vision

Subscribe to our newsletter

Explore

Services

Uploads

Store

About Us

Follow Us

Magazine

AI Pulse: