Challenging the Notion That LLMs Can’t Reason: A Case Study with Einstein’s Puzzle

When Apple declared that LLMs can't reason, they forgot one crucial detail: a hammer isn't meant to turn screws. In our groundbreaking study of Einstein's classic logic puzzle, we discovered something fascinating. While language models initially stumbled with pure reasoning - making amusing claims like "Plumbers don't drive Porsches" - they excelled at an unexpected task.
Challenging the Notion That LLMs Can’t Reason: A Case Study with Einstein’s Puzzle
Image Credit: SmartR AI

Introduction to LLMs and the Reasoning Debate

A recent Apple publication argued that Large Language Models (LLMs) cannot effectively reason. While there is some merit to this claim regarding out-of-the-box performance, this article demonstrates that with proper application, LLMs can indeed solve complex reasoning problems.

The Initial Experiment: Einstein’s Puzzle


We set out to test LLM reasoning capabilities using Einstein’s puzzle, a complex logic problem involving 5 houses with different characteristics and 15 clues to determine who owns a fish. Our initial tests with leading LLMs showed mixed results:

  • OpenAI’s model correctly guessed the answer, but without clear reasoning
  • Claude provided an incorrect answer
  • When we modified the puzzle with new elements (cars, hobbies, drinks, colors, and jobs), both models failed significantly

Tree of Thoughts Approach and Its Challenges

We implemented our Tree of Thoughts approach, where the model would:

  1. Make guesses about house arrangements
  2. Use critics to evaluate rule violations
  3. Feed this information back for the next round

However, this revealed several interesting failures in reasoning:

Logic Interpretation Issues

The critics often struggled with basic logical concepts. For example, when evaluating the rule “The Plumber lives next to the Pink house,” we received this confused response:

“The Plumber lives in House 2, which is also the Pink house. Since the Plumber lives in the Pink house, it means that the Plumber lives next to the Pink house, which is House 1 (Orange).”

Bias Interference

The models sometimes inserted unfounded biases into their reasoning. For instance:

“The Orange house cannot be in House 1 because the Plumber lives there and the Plumber does not drive a Porsche.”

The models also made assumptions about what music Porsche drivers would listen to, demonstrating how internal biases can interfere with pure logical reasoning.

A Solution Through Code Generation

While direct reasoning showed limitations, we discovered that LLMs could excel when used as code generators. We asked SCOTi to write MiniZinc code to solve the puzzle, resulting in a well-formed constraint programming solution. The key advantages of this approach were:

  1. Each rule could be cleanly translated into code statements
  2. The resulting code was highly readable
  3. MiniZinc could solve the puzzle efficiently

Example of Clear Rule Translation

The MiniZinc code demonstrated elegant translation of puzzle rules into constraints. For instance:

% Statement 11: The man who enjoys Music lives next to the man who drives Porsche
% Note / means AND in minizinc
constraint exists(i,j in 1..5)(abs(i-j) == 1 / hobbies[i] = Music / cars[j] = Porsche);

If you would like to get the full MiniZinc code, please contact me.

Implications and Conclusions: Rethinking the Role of LLMs

This experiment reveals several important insights about LLM capabilities:

  1. Direct reasoning with complex logic can be challenging for LLMs
  2. Simple rule application works well, but performance degrades when multiple steps of inference are required
  3. LLMs excel when used as agents to generate code for solving logical problems
  4. The combination of LLM code generation and traditional constraint solving tools creates powerful solutions

The key takeaway is that while LLMs may struggle with certain types of direct reasoning, they can be incredibly effective when properly applied as components in a larger system. This represents a significant advancement in software development capabilities, demonstrating how LLMs can be transformative when used strategically rather than as standalone reasoning engines.

This study reinforces the view that LLMs are best understood as transformational software components rather than complete reasoning systems. Their impact on software development and problem-solving will continue to evolve as we better understand how to leverage their strengths while working around their limitations.


Recent Content

In 2025, data centers are at the forefront of AI innovation, balancing the explosive growth of AI workloads with urgent sustainability goals. This article explores how brownfield and greenfield developments help operators manage demand, support low-latency AI services, and drive toward net-zero carbon targets.
There’s immense pressure for companies in every industry to adopt AI, but not everyone has the in-house expertise, tools, or resources to understand where and how to deploy AI responsibly. Bloomberg hopes this taxonomy – when combined with red teaming and guardrail systems – helps to responsibly enable the financial industry to develop safe and reliable GenAI systems, be compliant with evolving regulatory standards and expectations, as well as strengthen trust among clients.
A focus on efficiency and cost-cutting, often driven by “bean counters” and “time and motion” experts, stifles innovation and leads to job losses, mirroring the current AI discourse. Overemphasis on efficiency, like the race to the bottom, can ultimately harms everyone except the initial beneficiaries. For example, distributed energy where building new infrastructure and expanding into new sectors, like solar, generates jobs in manufacturing, installation, and new industries. Instead of solely fearing job displacement, we should prioritize investment in innovation, education, entrepreneurship, and just transition policies to create a future where progress benefits all through job creation. I advocate for strategic investment to build the future, instead of just shrinking the present.
AI promises major gains for telecom operators, but most initiatives stall due to outdated, fragmented inventory systems. Discover why unified, service-aware inventory is the missing link for successful AI in telecom—and how operators can build a smarter, impact-ready foundation for automation with VC4’s Service2Create (S2C) platform.
As networks grow more complex, traditional management models fall short. This article explores how AIOps (Artificial Intelligence for IT Operations) enables autonomous networks that self-configure, self-optimize, and self-heal. Learn how service providers can use AIOps frameworks to achieve predictive maintenance, dynamic resource management, enhanced customer experiences, and operational scalability to thrive in the era of 5G, IoT, and beyond.
Indian telecom companies such as Jio and Airtel are moving beyond internal AI use cases to co-develop monetizable, India-focused AI applications in partnership with tech giants like Google, Nvidia, Cisco, and AMD. These collaborations are enabling sector-specific AI tools across healthcare, education, and agriculture, boosting operational efficiency, customer experience, and creating new revenue streams for telecom operators.
Whitepaper
Telecom networks are facing unprecedented complexity with 5G, IoT, and cloud services. Traditional service assurance methods are becoming obsolete, making AI-driven, real-time analytics essential for competitive advantage. This independent industry whitepaper explores how DPUs, GPUs, and Generative AI (GenAI) are enabling predictive automation, reducing operational costs, and improving service quality....
Whitepaper
Explore the collaboration between Purdue Research Foundation, Purdue University, Ericsson, and Saab at the Aviation Innovation Hub. Discover how private 5G networks, real-time analytics, and sustainable innovations are shaping the "Airport of the Future" for a smarter, safer, and greener aviation industry....
Article & Insights
This article explores the deployment of 5G NR Transparent Non-Terrestrial Networks (NTNs), detailing the architecture's advantages and challenges. It highlights how this "bent-pipe" NTN approach integrates ground-based gNodeB components with NGSO satellite constellations to expand global connectivity. Key challenges like moving beam management, interference mitigation, and latency are discussed, underscoring...

Download Magazine

With Subscription

Subscribe To Our Newsletter

Scroll to Top