Home » Challenging the Notion That LLMs Can’t Reason: A Case Study with Einstein’s Puzzle

Challenging the Notion That LLMs Can’t Reason: A Case Study with Einstein’s Puzzle

When Apple declared that LLMs can't reason, they forgot one crucial detail: a hammer isn't meant to turn screws. In our groundbreaking study of Einstein's classic logic puzzle, we discovered something fascinating. While language models initially stumbled with pure reasoning - making amusing claims like "Plumbers don't drive Porsches" - they excelled at an unexpected task.

By Oliver King-Smith, CEO and founder smartR AI
Last Updated: November 10, 2024

Introduction to LLMs and the Reasoning Debate

A recent Apple publication argued that Large Language Models (LLMs) cannot effectively reason. While there is some merit to this claim regarding out-of-the-box performance, this article demonstrates that with proper application, LLMs can indeed solve complex reasoning problems.

The Initial Experiment: Einstein’s Puzzle

We set out to test LLM reasoning capabilities using Einstein’s puzzle, a complex logic problem involving 5 houses with different characteristics and 15 clues to determine who owns a fish. Our initial tests with leading LLMs showed mixed results:

OpenAI’s model correctly guessed the answer, but without clear reasoning
Claude provided an incorrect answer
When we modified the puzzle with new elements (cars, hobbies, drinks, colors, and jobs), both models failed significantly

Tree of Thoughts Approach and Its Challenges

We implemented our Tree of Thoughts approach, where the model would:

Make guesses about house arrangements
Use critics to evaluate rule violations
Feed this information back for the next round

However, this revealed several interesting failures in reasoning:

Logic Interpretation Issues

The critics often struggled with basic logical concepts. For example, when evaluating the rule “The Plumber lives next to the Pink house,” we received this confused response:

“The Plumber lives in House 2, which is also the Pink house. Since the Plumber lives in the Pink house, it means that the Plumber lives next to the Pink house, which is House 1 (Orange).”

Bias Interference

The models sometimes inserted unfounded biases into their reasoning. For instance:

“The Orange house cannot be in House 1 because the Plumber lives there and the Plumber does not drive a Porsche.”

The models also made assumptions about what music Porsche drivers would listen to, demonstrating how internal biases can interfere with pure logical reasoning.

A Solution Through Code Generation

While direct reasoning showed limitations, we discovered that LLMs could excel when used as code generators. We asked SCOTi to write MiniZinc code to solve the puzzle, resulting in a well-formed constraint programming solution. The key advantages of this approach were:

Each rule could be cleanly translated into code statements
The resulting code was highly readable
MiniZinc could solve the puzzle efficiently

Example of Clear Rule Translation

The MiniZinc code demonstrated elegant translation of puzzle rules into constraints. For instance:

% Statement 11: The man who enjoys Music lives next to the man who drives Porsche
% Note / means AND in minizinc
constraint exists(i,j in 1..5)(abs(i-j) == 1 / hobbies[i] = Music / cars[j] = Porsche);

If you would like to get the full MiniZinc code, please contact me.

Implications and Conclusions: Rethinking the Role of LLMs

This experiment reveals several important insights about LLM capabilities:

Direct reasoning with complex logic can be challenging for LLMs
Simple rule application works well, but performance degrades when multiple steps of inference are required
LLMs excel when used as agents to generate code for solving logical problems
The combination of LLM code generation and traditional constraint solving tools creates powerful solutions

The key takeaway is that while LLMs may struggle with certain types of direct reasoning, they can be incredibly effective when properly applied as components in a larger system. This represents a significant advancement in software development capabilities, demonstrating how LLMs can be transformative when used strategically rather than as standalone reasoning engines.

This study reinforces the view that LLMs are best understood as transformational software components rather than complete reasoning systems. Their impact on software development and problem-solving will continue to evolve as we better understand how to leverage their strengths while working around their limitations.

AI
Apple, Chatgpt, LLM, OpenAI

Oliver King-Smith, CEO and founder smartR AI

Oliver King-Smith is CEO of smartR AI, a company which a company which facilitates and empowers organizations to extract real value from their data in an ethical, responsible, and sustainable manner using cutting edge AI technology.

All Posts

The Telco to Techco Transformation in AI and Digital Platforms: Beyond Connectivity

Article & Insights
April 10, 2025
Hema Kadia

In Beyond Connectivity: The Telco to Techco Transformation, leaders from e&, KDDI, and MTN reveal how telecoms are evolving into technology-first, platform-driven companies. These digital pioneers are integrating AI, 5G, cloud, smart infrastructure, and fintech to unlock massive value—from AI-powered smart cities in Japan, to inclusive fintech platforms in Africa, and cloud-first enterprise solutions in the Middle East. This piece explores how telcos are reshaping their role in the digital economy—building intelligent, scalable, and people-first tech ecosystems.

5G, 6G, AI, Edge/MEC, IoT, Satellite & NTN, Sustainability
Data Center, Etisalat, Fiber, FinTech, KDDI, MTN, MWC, Partnerships, Robotic
Smart Cities

Balancing Innovation and Regulation: Global Telecom Policy in Action

Article & Insights
April 10, 2025
Hema Kadia

In Balancing Innovation and Regulation: Global Perspectives on Telecom Policy, top leaders including Jyotiraditya Scindia (India), Henna Virkkunen (European Commission), and Brendan Carr (U.S. FCC) explore how governments are aligning policy with innovation to future-proof their digital infrastructure. From India’s record-breaking 5G rollout and 6G ambitions, to Europe’s push for AI sovereignty and U.S. leadership in open-market connectivity, this piece outlines how nations can foster growth, security, and inclusion in a hyperconnected world.

5G, 6G, AI, Automation, FWA, IoT
America, Broadband, DoT, Europe, FCC, Fiber, India, Policy

Driving Europe’s Digital Future: Telecom Leaders on Innovation and Reform

Article & Insights
April 10, 2025
Hema Kadia

In Driving Europe’s Digital Future, telecom leaders Margherita Della Valle (Vodafone), Christel Heydemann (Orange), and Tim Höttges (Deutsche Telekom) deliver a unified message: Europe must reform telecom regulation, invest in AI and infrastructure, and scale operations to remain globally competitive. From lagging 5G rollout to emerging AI-at-the-edge opportunities, they urge policymakers to embrace consolidation, cut red tape, and drive fair investment frameworks. Europe’s path to digital sovereignty hinges on bold leadership, collaborative policy, and future-ready infrastructure.

5G, AI, Edge/MEC, Satellite & NTN, Security
America, Cybersecurity, Deutsche Telekom, Europe, Investment, MWC, Orange, OTT, Policy, Vodafone
Telecom

The AI Frontier: Visionary Insights on Ethics, Enterprise & Societal Impact

Article & Insights
April 10, 2025
Hema Kadia

In The AI Frontier: Transformative Visions and Societal Impact, global AI leaders explore the next phase of artificial intelligence—from Ray Kurzweil’s prediction of AGI by 2029 and bio-integrated computing, to Alessandra Sala’s call for inclusive, ethical model design, and Vilas Dhar’s vision of AI as a tool for systemic human good. Martin Kon of Cohere urges businesses to go beyond the hype and ground AI in real enterprise value. Together, these voices chart a path for AI that centers values, equity, and impact—not just innovation.

Technology Game Changers: AI, Robotics & the Future of Human-Centered Innovation

Article & Insights
April 10, 2025
Hema Kadia

In Technology Game Changers, leaders from Agility Robotics, Lenovo, Databricks, Mistral AI, and Maven Clinic showcase how AI and robotics are moving from novelty to necessity. From Peggy Johnson’s Digit transforming warehouse labor, to Lenovo’s hybrid AI ecosystem, Databricks’ frictionless AI UIs, Mistral’s sovereignty-focused open-source models, and Maven’s virtual women’s health platform, this article explores the intelligent, personalized, and responsible future of tech. The next frontier of innovation isn’t just smart—it’s human-centered.

AI, Automation
Agility Robotics, AI Agents, America, Chatgpt, DeepSeek, GenAI, Lenovo, Microsoft, Mistral AI, Nvidia, Orange, Robotic
HealthCare, Smart Homes

AI in Global Shifts: How AI and Geopolitics Are Reshaping World Power

Article & Insights
April 10, 2025
Hema Kadia

Global Shifts explores how leaders like Keyu Jin and Gregory Allen are analyzing the breakdown of old globalization models and the rise of new strategic paradigms. Jin outlines the emergence of regional economic blocs, China’s shift toward technology self-reliance, and the decentralization of capital. Allen frames AI as a strategic battleground, discussing export controls, the rise of DeepSeek, and the risks of decoupling. The piece offers a critical look at how economic power and innovation are evolving in an era defined by urgency, sovereignty, and competition.

AI, Semiconductor
America, India, Partnerships, Policy
Ports, Transportation, Warehouse and Logistics

Download Magazine

With Subscription

AI Pulse: Telecom’s New Frontier

Latest Videos

NTT DATA and Nokia Transform Brownsville into a Smart City with Private 5G

Challenging the Notion That LLMs Can’t Reason: A Case Study with Einstein’s Puzzle

Introduction to LLMs and the Reasoning Debate

The Initial Experiment: Einstein’s Puzzle

Tree of Thoughts Approach and Its Challenges

Logic Interpretation Issues

Bias Interference

A Solution Through Code Generation

Example of Clear Rule Translation

Implications and Conclusions: Rethinking the Role of LLMs

Oliver King-Smith, CEO and founder smartR AI

Recent Content

The Telco to Techco Transformation in AI and Digital Platforms: Beyond Connectivity

Balancing Innovation and Regulation: Global Telecom Policy in Action

Driving Europe’s Digital Future: Telecom Leaders on Innovation and Reform

The AI Frontier: Visionary Insights on Ethics, Enterprise & Societal Impact

Technology Game Changers: AI, Robotics & the Future of Human-Centered Innovation

AI in Global Shifts: How AI and Geopolitics Are Reshaping World Power

Download Magazine

AI Pulse: Telecom’s New Frontier

Sponsored Content

Precise Time as a critical national service | NetInsight Case Study

Whitepaper

Unleashing the Power of 5G Analytics – Driving Cost Savings and Revenue Generation Strategies

Whitepaper

Subscribe To Our Newsletter

Latest Videos

NTT Data and Nokia: Driving Private Networks for Smart Cities

Private Networks for Mining: How Ericsson and Epiroc Lead the Way

How Ericsson’s Private 5G Transforms Smart Factory Operations

Private Networks for Post-Hurricane Recovery: A Case Study

Private Networks for Agriculture: Trilogy’s Vision

Subscribe to our newsletter

Explore

Services

Uploads

Store

About Us

Follow Us

Magazine

AI Pulse: