Home » Challenging the Notion That LLMs Can’t Reason: A Case Study with Einstein’s Puzzle

Challenging the Notion That LLMs Can’t Reason: A Case Study with Einstein’s Puzzle

When Apple declared that LLMs can't reason, they forgot one crucial detail: a hammer isn't meant to turn screws. In our groundbreaking study of Einstein's classic logic puzzle, we discovered something fascinating. While language models initially stumbled with pure reasoning - making amusing claims like "Plumbers don't drive Porsches" - they excelled at an unexpected task.

By Oliver King-Smith, CEO and founder smartR AI
Last Updated: November 10, 2024

Introduction to LLMs and the Reasoning Debate

A recent Apple publication argued that Large Language Models (LLMs) cannot effectively reason. While there is some merit to this claim regarding out-of-the-box performance, this article demonstrates that with proper application, LLMs can indeed solve complex reasoning problems.

The Initial Experiment: Einstein’s Puzzle

We set out to test LLM reasoning capabilities using Einstein’s puzzle, a complex logic problem involving 5 houses with different characteristics and 15 clues to determine who owns a fish. Our initial tests with leading LLMs showed mixed results:

OpenAI’s model correctly guessed the answer, but without clear reasoning
Claude provided an incorrect answer
When we modified the puzzle with new elements (cars, hobbies, drinks, colors, and jobs), both models failed significantly

Tree of Thoughts Approach and Its Challenges

We implemented our Tree of Thoughts approach, where the model would:

Make guesses about house arrangements
Use critics to evaluate rule violations
Feed this information back for the next round

However, this revealed several interesting failures in reasoning:

Logic Interpretation Issues

The critics often struggled with basic logical concepts. For example, when evaluating the rule “The Plumber lives next to the Pink house,” we received this confused response:

“The Plumber lives in House 2, which is also the Pink house. Since the Plumber lives in the Pink house, it means that the Plumber lives next to the Pink house, which is House 1 (Orange).”

Bias Interference

The models sometimes inserted unfounded biases into their reasoning. For instance:

“The Orange house cannot be in House 1 because the Plumber lives there and the Plumber does not drive a Porsche.”

The models also made assumptions about what music Porsche drivers would listen to, demonstrating how internal biases can interfere with pure logical reasoning.

A Solution Through Code Generation

While direct reasoning showed limitations, we discovered that LLMs could excel when used as code generators. We asked SCOTi to write MiniZinc code to solve the puzzle, resulting in a well-formed constraint programming solution. The key advantages of this approach were:

Each rule could be cleanly translated into code statements
The resulting code was highly readable
MiniZinc could solve the puzzle efficiently

Example of Clear Rule Translation

The MiniZinc code demonstrated elegant translation of puzzle rules into constraints. For instance:

% Statement 11: The man who enjoys Music lives next to the man who drives Porsche
% Note / means AND in minizinc
constraint exists(i,j in 1..5)(abs(i-j) == 1 / hobbies[i] = Music / cars[j] = Porsche);

If you would like to get the full MiniZinc code, please contact me.

Implications and Conclusions: Rethinking the Role of LLMs

This experiment reveals several important insights about LLM capabilities:

Direct reasoning with complex logic can be challenging for LLMs
Simple rule application works well, but performance degrades when multiple steps of inference are required
LLMs excel when used as agents to generate code for solving logical problems
The combination of LLM code generation and traditional constraint solving tools creates powerful solutions

The key takeaway is that while LLMs may struggle with certain types of direct reasoning, they can be incredibly effective when properly applied as components in a larger system. This represents a significant advancement in software development capabilities, demonstrating how LLMs can be transformative when used strategically rather than as standalone reasoning engines.

This study reinforces the view that LLMs are best understood as transformational software components rather than complete reasoning systems. Their impact on software development and problem-solving will continue to evolve as we better understand how to leverage their strengths while working around their limitations.

AI
Apple, Chatgpt, LLM, OpenAI

Oliver King-Smith, CEO and founder smartR AI

Oliver King-Smith is CEO of smartR AI, a company which a company which facilitates and empowers organizations to extract real value from their data in an ethical, responsible, and sustainable manner using cutting edge AI technology.

All Posts

Observe.AI Launches VoiceAI for Call Center Automation

Tech News & Insight
March 27, 2025
Hema Kadia

Observe.AI has unveiled VoiceAI agents—intelligent, realistic voice-powered AI tools designed to automate contact center operations. These AI agents manage routine customer interactions using advanced voice technology, reduce support costs by up to 80%, and integrate easily with tools like Salesforce and Zendesk. With features like interruption detection and robust data security, VoiceAI agents mark a leap forward in contact center automation.

AI, Security
AI Agents, Anthropic, Customer Service, Cybersecurity, OpenAI

India’s 5G Future: Airtel FWA, Jio’s AI & 6G Plans, Vi’s User Experience Strategy

Tech News & Insight
March 24, 2025
Hema Kadia

At the ETTelecom 5G Congress 2025, top Indian telecom players shared strategies for 5G growth, AI integration, and future tech like 6G. Bharti Airtel emphasized Fixed Wireless Access (FWA), Jio highlighted AI and its 6G roadmap, while Vodafone Idea focused on delivering high-quality 5G user experiences. With 84% population 5G coverage and India targeting 1 billion users by 2030, the telecom industry is at a pivotal moment.

5G, AI, FWA, IoT, Semiconductor
Broadband, DoT

Decoding the Vibe – AI’s Impact on Developer Expertise

Tech News & Insight
March 24, 2025
Oliver King-Smith, CEO and founder smartR AI

The emergence of “vibe coding,” a term representing AI-driven software development, presents both opportunities and risks to the industry. This approach, emphasizing prompt engineering and AI-generated code, can potentially increase productivity and democratize development, but it also introduces concerns about code reliability, skill degradation, and dependence on AI. To harness the benefits of AI while mitigating these risks, developers must prioritize robust testing, clear coding standards, and a balance between intuitive insights and rigorous technical practices, ensuring that the fundamentals of software development are not lost.

AI, Predictions, Security, Sustainability
Chatgpt, OpenAI, SmartR AI

Best Free AI Courses & Certifications Online in 2025

Tech News & Insight
March 23, 2025
Hema Kadia

Looking to learn AI in 2025 without breaking the bank? This blog breaks down the best free AI courses and certifications from top platforms like Google, IBM, and Harvard. Whether you’re a beginner, teacher, or tech professional, you’ll find career-relevant learning paths, direct course links, and tips to get certified and start building AI projects today.

AI
AWS, Deep Learning, Google, Harvard, IBM, LinkedIn, Microsoft, Udemy

Exploring the Evolution of O-RAN Technologies

Tech News & Insight
March 23, 2025
Hema Kadia

Explore the transformative potential of Open Radio Access Networks (O-RAN) as it integrates AI, enhances security, and fosters interoperability to reshape mobile network infrastructure. In this article, we explore the advancements and challenges of O-RAN, revealing how it sets the stage for future mobile communications with smarter, more secure, and highly adaptable network solutions. Dive into the strategic implications for the telecommunications industry and learn why O-RAN is critical for the next generation of digital connectivity.

AI, Network Infrastructure, Network Slicing, Open RAN, RAN, Security
3GPP, Data, MWC, Zero Trust
Telecom

Nvidia’s AI Consortium Drives AI-Driven Energy Management

Tech News & Insight
March 23, 2025
Hema K

Nvidia’s Open Power AI Consortium is pioneering the integration of AI in energy management, collaborating with industry giants to enhance grid efficiency and sustainability. This initiative not only caters to the rising demands of data centers but also promotes the use of renewable energy, illustrating a significant shift towards environmentally sustainable practices. Discover how this synergy between technology and energy sectors is setting new benchmarks in innovative and sustainable energy solutions.

AI, Sustainability
Apple, Data Center, Google, Investment, Nvidia, Oracle

Download Magazine

With Subscription

AI Pulse: Telecom’s New Frontier

Subscribe To Our Newsletter

Partner Events

Executive Interviews

NTT DATA and Nokia Transform Brownsville into a Smart City with Private 5G

Challenging the Notion That LLMs Can’t Reason: A Case Study with Einstein’s Puzzle

Introduction to LLMs and the Reasoning Debate

The Initial Experiment: Einstein’s Puzzle

Tree of Thoughts Approach and Its Challenges

Logic Interpretation Issues

Bias Interference

A Solution Through Code Generation

Example of Clear Rule Translation

Implications and Conclusions: Rethinking the Role of LLMs

Oliver King-Smith, CEO and founder smartR AI

Recent Content

Whitepaper

Whitepaper

Subscribe To Our Newsletter

Partner Events

Executive Interviews

Magazine