Home » Challenging the Notion That LLMs Can’t Reason: A Case Study with Einstein’s Puzzle

Challenging the Notion That LLMs Can’t Reason: A Case Study with Einstein’s Puzzle

When Apple declared that LLMs can't reason, they forgot one crucial detail: a hammer isn't meant to turn screws. In our groundbreaking study of Einstein's classic logic puzzle, we discovered something fascinating. While language models initially stumbled with pure reasoning - making amusing claims like "Plumbers don't drive Porsches" - they excelled at an unexpected task.

By Oliver King-Smith, CEO and founder smartR AI
Last Updated: November 10, 2024

Introduction to LLMs and the Reasoning Debate

A recent Apple publication argued that Large Language Models (LLMs) cannot effectively reason. While there is some merit to this claim regarding out-of-the-box performance, this article demonstrates that with proper application, LLMs can indeed solve complex reasoning problems.

The Initial Experiment: Einstein’s Puzzle

We set out to test LLM reasoning capabilities using Einstein’s puzzle, a complex logic problem involving 5 houses with different characteristics and 15 clues to determine who owns a fish. Our initial tests with leading LLMs showed mixed results:

OpenAI’s model correctly guessed the answer, but without clear reasoning
Claude provided an incorrect answer
When we modified the puzzle with new elements (cars, hobbies, drinks, colors, and jobs), both models failed significantly

Tree of Thoughts Approach and Its Challenges

We implemented our Tree of Thoughts approach, where the model would:

Make guesses about house arrangements
Use critics to evaluate rule violations
Feed this information back for the next round

However, this revealed several interesting failures in reasoning:

Logic Interpretation Issues

The critics often struggled with basic logical concepts. For example, when evaluating the rule “The Plumber lives next to the Pink house,” we received this confused response:

“The Plumber lives in House 2, which is also the Pink house. Since the Plumber lives in the Pink house, it means that the Plumber lives next to the Pink house, which is House 1 (Orange).”

Bias Interference

The models sometimes inserted unfounded biases into their reasoning. For instance:

“The Orange house cannot be in House 1 because the Plumber lives there and the Plumber does not drive a Porsche.”

The models also made assumptions about what music Porsche drivers would listen to, demonstrating how internal biases can interfere with pure logical reasoning.

A Solution Through Code Generation

While direct reasoning showed limitations, we discovered that LLMs could excel when used as code generators. We asked SCOTi to write MiniZinc code to solve the puzzle, resulting in a well-formed constraint programming solution. The key advantages of this approach were:

Each rule could be cleanly translated into code statements
The resulting code was highly readable
MiniZinc could solve the puzzle efficiently

Example of Clear Rule Translation

The MiniZinc code demonstrated elegant translation of puzzle rules into constraints. For instance:

% Statement 11: The man who enjoys Music lives next to the man who drives Porsche
% Note / means AND in minizinc
constraint exists(i,j in 1..5)(abs(i-j) == 1 / hobbies[i] = Music / cars[j] = Porsche);

If you would like to get the full MiniZinc code, please contact me.

Implications and Conclusions: Rethinking the Role of LLMs

This experiment reveals several important insights about LLM capabilities:

Direct reasoning with complex logic can be challenging for LLMs
Simple rule application works well, but performance degrades when multiple steps of inference are required
LLMs excel when used as agents to generate code for solving logical problems
The combination of LLM code generation and traditional constraint solving tools creates powerful solutions

The key takeaway is that while LLMs may struggle with certain types of direct reasoning, they can be incredibly effective when properly applied as components in a larger system. This represents a significant advancement in software development capabilities, demonstrating how LLMs can be transformative when used strategically rather than as standalone reasoning engines.

This study reinforces the view that LLMs are best understood as transformational software components rather than complete reasoning systems. Their impact on software development and problem-solving will continue to evolve as we better understand how to leverage their strengths while working around their limitations.

AI
Apple, Chatgpt, LLM, OpenAI

Oliver King-Smith, CEO and founder smartR AI

Oliver King-Smith is CEO of smartR AI, a company which a company which facilitates and empowers organizations to extract real value from their data in an ethical, responsible, and sustainable manner using cutting edge AI technology.

All Posts

Open Compute Project Launches AI Portal for Scalable AI Infrastructure

Tech News & Insight
April 29, 2025
Hema Kadia

The Open Compute Project (OCP) has launched a centralized AI portal offering infrastructure tools, white papers, deployment blueprints, and open hardware standards. Designed to support scalable AI data centers, the portal features contributions from Meta, NVIDIA, and more, driving open innovation in AI cluster deployments.

AI, Edge/MEC
Meta, Nvidia

Data Centers and AI: Meeting 2025 Demand While Driving Sustainability

Tech News & Insight
April 28, 2025
Hema Kadia

In 2025, data centers are at the forefront of AI innovation, balancing the explosive growth of AI workloads with urgent sustainability goals. This article explores how brownfield and greenfield developments help operators manage demand, support low-latency AI services, and drive toward net-zero carbon targets.

6G, AI, Edge/MEC, Metaverse, Sustainability
Data Center, Quantum Computing

Bloomberg AI Researchers Mitigate Risks of “Unsafe” RAG LLMs and GenAI in Finance

Tech News & Insight
April 28, 2025
News Feed

There’s immense pressure for companies in every industry to adopt AI, but not everyone has the in-house expertise, tools, or resources to understand where and how to deploy AI responsibly. Bloomberg hopes this taxonomy – when combined with red teaming and guardrail systems – helps to responsibly enable the financial industry to develop safe and reliable GenAI systems, be compliant with evolving regulatory standards and expectations, as well as strengthen trust among clients.

Efficiency vs. Expansion: How Progress Can Fuel Job Creation, Not Fear

Tech News & Insight
April 28, 2025
Oliver King-Smith, CEO and founder smartR AI

A focus on efficiency and cost-cutting, often driven by “bean counters” and “time and motion” experts, stifles innovation and leads to job losses, mirroring the current AI discourse. Overemphasis on efficiency, like the race to the bottom, can ultimately harms everyone except the initial beneficiaries. For example, distributed energy where building new infrastructure and expanding into new sectors, like solar, generates jobs in manufacturing, installation, and new industries. Instead of solely fearing job displacement, we should prioritize investment in innovation, education, entrepreneurship, and just transition policies to create a future where progress benefits all through job creation. I advocate for strategic investment to build the future, instead of just shrinking the present.

AI in Telecom: Big Promises, But Sometimes Bigger Roadblocks

Tech News & Insight
April 28, 2025
Robert James Armstrong

AI promises major gains for telecom operators, but most initiatives stall due to outdated, fragmented inventory systems. Discover why unified, service-aware inventory is the missing link for successful AI in telecom—and how operators can build a smarter, impact-ready foundation for automation with VC4’s Service2Create (S2C) platform.

AI, Automation, IoT, Network Infrastructure, OSS-BSS, SD-WAN, Sustainability, Telco Cloud
Fiber
Telecom

Enabling Autonomous Networks with AIOps for Smarter, Resilient Connectivity

Tech News & Insight
April 28, 2025
Hema Kadia

As networks grow more complex, traditional management models fall short. This article explores how AIOps (Artificial Intelligence for IT Operations) enables autonomous networks that self-configure, self-optimize, and self-heal. Learn how service providers can use AIOps frameworks to achieve predictive maintenance, dynamic resource management, enhanced customer experiences, and operational scalability to thrive in the era of 5G, IoT, and beyond.

AI, Automation, Network Infrastructure, Orchestration
AIOps, Ciena
Telecom

Industry-Specific Private 5G Network Readiness Tools

Download Magazine

With Subscription

AI Pulse: Telecom’s New Frontier

Subscribe To Our Newsletter

Private Network Readiness Blueprint

Industry Specific Deep-Dive Assessment for Private Networks.

* Prices does not include tax

Partner Events

Executive Interviews

Private 5G in South Korea: Factory Deployment Insights and Use Cases

Challenging the Notion That LLMs Can’t Reason: A Case Study with Einstein’s Puzzle

Introduction to LLMs and the Reasoning Debate

The Initial Experiment: Einstein’s Puzzle

Tree of Thoughts Approach and Its Challenges

Logic Interpretation Issues

Bias Interference

A Solution Through Code Generation

Example of Clear Rule Translation

Implications and Conclusions: Rethinking the Role of LLMs

Oliver King-Smith, CEO and founder smartR AI

Recent Content

Whitepaper

Subscribe To Our Newsletter

Private Network Readiness Blueprint

Partner Events

Executive Interviews