Challenging the Notion That LLMs Can’t Reason: A Case Study with Einstein’s Puzzle

When Apple declared that LLMs can't reason, they forgot one crucial detail: a hammer isn't meant to turn screws. In our groundbreaking study of Einstein's classic logic puzzle, we discovered something fascinating. While language models initially stumbled with pure reasoning - making amusing claims like "Plumbers don't drive Porsches" - they excelled at an unexpected task.
Challenging the Notion That LLMs Can’t Reason: A Case Study with Einstein’s Puzzle
Image Credit: SmartR AI

Introduction to LLMs and the Reasoning Debate

A recent Apple publication argued that Large Language Models (LLMs) cannot effectively reason. While there is some merit to this claim regarding out-of-the-box performance, this article demonstrates that with proper application, LLMs can indeed solve complex reasoning problems.

The Initial Experiment: Einstein’s Puzzle


We set out to test LLM reasoning capabilities using Einstein’s puzzle, a complex logic problem involving 5 houses with different characteristics and 15 clues to determine who owns a fish. Our initial tests with leading LLMs showed mixed results:

  • OpenAI’s model correctly guessed the answer, but without clear reasoning
  • Claude provided an incorrect answer
  • When we modified the puzzle with new elements (cars, hobbies, drinks, colors, and jobs), both models failed significantly

Tree of Thoughts Approach and Its Challenges

We implemented our Tree of Thoughts approach, where the model would:

  1. Make guesses about house arrangements
  2. Use critics to evaluate rule violations
  3. Feed this information back for the next round

However, this revealed several interesting failures in reasoning:

Logic Interpretation Issues

The critics often struggled with basic logical concepts. For example, when evaluating the rule “The Plumber lives next to the Pink house,” we received this confused response:

“The Plumber lives in House 2, which is also the Pink house. Since the Plumber lives in the Pink house, it means that the Plumber lives next to the Pink house, which is House 1 (Orange).”

Bias Interference

The models sometimes inserted unfounded biases into their reasoning. For instance:

“The Orange house cannot be in House 1 because the Plumber lives there and the Plumber does not drive a Porsche.”

The models also made assumptions about what music Porsche drivers would listen to, demonstrating how internal biases can interfere with pure logical reasoning.

A Solution Through Code Generation

While direct reasoning showed limitations, we discovered that LLMs could excel when used as code generators. We asked SCOTi to write MiniZinc code to solve the puzzle, resulting in a well-formed constraint programming solution. The key advantages of this approach were:

  1. Each rule could be cleanly translated into code statements
  2. The resulting code was highly readable
  3. MiniZinc could solve the puzzle efficiently

Example of Clear Rule Translation

The MiniZinc code demonstrated elegant translation of puzzle rules into constraints. For instance:

% Statement 11: The man who enjoys Music lives next to the man who drives Porsche
% Note / means AND in minizinc
constraint exists(i,j in 1..5)(abs(i-j) == 1 / hobbies[i] = Music / cars[j] = Porsche);

If you would like to get the full MiniZinc code, please contact me.

Implications and Conclusions: Rethinking the Role of LLMs

This experiment reveals several important insights about LLM capabilities:

  1. Direct reasoning with complex logic can be challenging for LLMs
  2. Simple rule application works well, but performance degrades when multiple steps of inference are required
  3. LLMs excel when used as agents to generate code for solving logical problems
  4. The combination of LLM code generation and traditional constraint solving tools creates powerful solutions

The key takeaway is that while LLMs may struggle with certain types of direct reasoning, they can be incredibly effective when properly applied as components in a larger system. This represents a significant advancement in software development capabilities, demonstrating how LLMs can be transformative when used strategically rather than as standalone reasoning engines.

This study reinforces the view that LLMs are best understood as transformational software components rather than complete reasoning systems. Their impact on software development and problem-solving will continue to evolve as we better understand how to leverage their strengths while working around their limitations.


Recent Content

In Technology, Climate Change and Justice, top leaders from Arm, The B Team, Vattenfall, and Silo AI outline how technology can both fuel and fix the climate crisis. From Leah Seligmann’s values-driven climate leadership to Anna Borg’s clean-energy grids and Peter Sarlin’s push for efficient, open-source AI, this piece highlights how innovation must align with inclusion, sustainability, and resilience. The message is clear: solving climate change isn’t just about new tech—it’s about how we deploy it, who benefits, and whether it truly serves a livable future.
In Innovation In Action, executives from Time, Sierra, and Axios share how they’re redefining business, media, and journalism with AI. Time is unlocking over a century of content for fair AI use, while Sierra’s “agentic AI” elevates the customer experience across industries. Axios emphasizes human-first reporting with AI support. Across the board, these leaders show how strategic adaptation can embrace AI without compromising trust, transparency, or editorial integrity.
The future of manufacturing is intelligent, autonomous, and sustainable. Powered by private 5G networks, AI, and digital twins, smart factories are revolutionizing how goods are produced and maintained. From predictive maintenance to immersive virtual twins and AI-optimized energy systems, smart manufacturing is unlocking new levels of efficiency and innovation across industries—from ports and shipyards to agriculture and healthcare.
Smart mobility is reshaping how the world moves, powered by 5G, AI, and edge computing. From autonomous vehicles and real-time logistics to AI-driven drones and connected public transport, intelligent transportation systems are redefining urban mobility, logistics, and industrial automation. As global investment and collaboration grow, the transportation industry is transforming into a $11.1 trillion smart ecosystem focused on sustainability, efficiency, and connectivity.
FinTech, private 5G networks, and AI are converging to reshape digital finance across industries. From embedded payments and super apps to AI-driven credit scoring and secure M2M transactions, this $2 trillion opportunity is powered by mobile technology, cloud infrastructure, and regulatory evolution. Leaders must act fast to unlock new revenue, scale inclusion, and secure digital ecosystems.
The future of sports and entertainment is fan-first, immersive, and data-driven. Powered by D2C models, 5G networks, AI content creation, and super apps, industry leaders are reimagining fan experiences—from Bundesliga’s mobile strategy to Web2.5’s tokenized communities. The shift is not just technical but cultural, prioritizing personalization, monetization, and real-time interaction across every touchpoint.

Download Magazine

With Subscription
Whitepaper
Explore the Private Network Edition of 5G Magazine, your guide to the latest in private 5G/LTE and CBRS networks. This edition spotlights 11 award categories including private 5G/LTE leader, neutral host leader, and rising startups. It features insights from industry leaders like Jason Wallin of John Deere and an analysis...
Whitepaper
Discover the potential of mobile networks in modern warfare through our extensive whitepaper. Dive into its strategic significance, understand its security risks, and gain insights on optimizing mobile networks in critical situations. An essential guide for defense planners and cybersecurity enthusiasts....

It seems we can't find what you're looking for.

Subscribe To Our Newsletter

Scroll to Top