Private Network Check Readiness - TeckNexus Solutions

Home » Challenging the Notion That LLMs Can’t Reason: A Case Study with Einstein’s Puzzle

Challenging the Notion That LLMs Can’t Reason: A Case Study with Einstein’s Puzzle

When Apple declared that LLMs can't reason, they forgot one crucial detail: a hammer isn't meant to turn screws. In our groundbreaking study of Einstein's classic logic puzzle, we discovered something fascinating. While language models initially stumbled with pure reasoning - making amusing claims like "Plumbers don't drive Porsches" - they excelled at an unexpected task.

By Oliver King-Smith, CEO and founder smartR AI
Last Updated: November 10, 2024

Introduction to LLMs and the Reasoning Debate

A recent Apple publication argued that Large Language Models (LLMs) cannot effectively reason. While there is some merit to this claim regarding out-of-the-box performance, this article demonstrates that with proper application, LLMs can indeed solve complex reasoning problems.

The Initial Experiment: Einstein’s Puzzle

We set out to test LLM reasoning capabilities using Einstein’s puzzle, a complex logic problem involving 5 houses with different characteristics and 15 clues to determine who owns a fish. Our initial tests with leading LLMs showed mixed results:

OpenAI’s model correctly guessed the answer, but without clear reasoning
Claude provided an incorrect answer
When we modified the puzzle with new elements (cars, hobbies, drinks, colors, and jobs), both models failed significantly

Tree of Thoughts Approach and Its Challenges

We implemented our Tree of Thoughts approach, where the model would:

Make guesses about house arrangements
Use critics to evaluate rule violations
Feed this information back for the next round

However, this revealed several interesting failures in reasoning:

Logic Interpretation Issues

The critics often struggled with basic logical concepts. For example, when evaluating the rule “The Plumber lives next to the Pink house,” we received this confused response:

“The Plumber lives in House 2, which is also the Pink house. Since the Plumber lives in the Pink house, it means that the Plumber lives next to the Pink house, which is House 1 (Orange).”

Bias Interference

The models sometimes inserted unfounded biases into their reasoning. For instance:

“The Orange house cannot be in House 1 because the Plumber lives there and the Plumber does not drive a Porsche.”

The models also made assumptions about what music Porsche drivers would listen to, demonstrating how internal biases can interfere with pure logical reasoning.

A Solution Through Code Generation

While direct reasoning showed limitations, we discovered that LLMs could excel when used as code generators. We asked SCOTi to write MiniZinc code to solve the puzzle, resulting in a well-formed constraint programming solution. The key advantages of this approach were:

Each rule could be cleanly translated into code statements
The resulting code was highly readable
MiniZinc could solve the puzzle efficiently

Example of Clear Rule Translation

The MiniZinc code demonstrated elegant translation of puzzle rules into constraints. For instance:

% Statement 11: The man who enjoys Music lives next to the man who drives Porsche
% Note / means AND in minizinc
constraint exists(i,j in 1..5)(abs(i-j) == 1 / hobbies[i] = Music / cars[j] = Porsche);

If you would like to get the full MiniZinc code, please contact me.

Implications and Conclusions: Rethinking the Role of LLMs

This experiment reveals several important insights about LLM capabilities:

Direct reasoning with complex logic can be challenging for LLMs
Simple rule application works well, but performance degrades when multiple steps of inference are required
LLMs excel when used as agents to generate code for solving logical problems
The combination of LLM code generation and traditional constraint solving tools creates powerful solutions

The key takeaway is that while LLMs may struggle with certain types of direct reasoning, they can be incredibly effective when properly applied as components in a larger system. This represents a significant advancement in software development capabilities, demonstrating how LLMs can be transformative when used strategically rather than as standalone reasoning engines.

This study reinforces the view that LLMs are best understood as transformational software components rather than complete reasoning systems. Their impact on software development and problem-solving will continue to evolve as we better understand how to leverage their strengths while working around their limitations.

AI
Apple, Chatgpt, LLM, OpenAI

Oliver King-Smith, CEO and founder smartR AI

Oliver King-Smith is CEO of smartR AI, a company which a company which facilitates and empowers organizations to extract real value from their data in an ethical, responsible, and sustainable manner using cutting edge AI technology.

All Posts

Lumen NaaS Surpasses 1,000 Customers

Tech News & Insight
August 13, 2025
Hema K

Lumen surpassing 1,000 customers on its Network-as-a-Service platform is a clear marker for where enterprise networking is headed. AI adoption, multi-cloud architectures, and distributed applications are pushing organizations toward on-demand, software-driven connectivity. Lumens platform bundles three core service types under a single digital experience. The platform integrates with major hyperscalers, enabling direct paths to AWS, Microsoft Azure, and Google Cloud. All can be provisioned self-service, scaled up or down based on demand, and stitched to cloud regions and third-party data centers via cloud on-ramps.

AI, API, Automation, SASE, Security
AWS, Azure, Fiber, Google, Lumen, Microsoft, MTN, SaaS, Zayo

LG CNS & VNPT to build hyperscale AI data center in Vietnam

Tech News & Insight
August 13, 2025
Hema K

Vietnam is entering the hyperscale AI data center map, with VNPT and LG CNS positioning to meet local and regional demand. For telecom operators and enterprises, now is the time to align AI roadmaps with data center strategy: plan for high-density racks and liquid cooling, secure GPU capacity, engineer diverse connectivity, and build energy resilience. As the regions AI infrastructure forms, those who co-design workload placement, interconnect, and power from the outset will gain durable cost and performance advantages.

NTT DATA Launches Global Microsoft Cloud AI Unit

Tech News & Insight
August 11, 2025
Hema Kadia

NTT DATA has launched a Global Microsoft Cloud Business Unit to help enterprises worldwide accelerate AI-powered cloud transformation. Backed by 24,000 Microsoft-certified specialists in over 50 countries, the unit focuses on cloud-native modernization, cybersecurity, Agentic AI orchestration, and sovereign cloud adoption. With deep integration into Microsoft’s engineering and sales ecosystem, NTT DATA aims to deliver secure, scalable, and compliant digital transformation at global scale.

AI, Security
AI Agents, Azure, Cybersecurity, Data Center, Microsoft, NTT

NVIDIA Unveils Omniverse NuRec, Cosmos AI, and RTX Blackwell for Robotics

Tech News & Insight
August 11, 2025
Hema Kadia

At SIGGRAPH 2025, NVIDIA unveiled Omniverse NuRec libraries for high-fidelity 3D world reconstruction, Cosmos AI foundation models for reasoning and synthetic data generation, and powerful RTX PRO Blackwell Servers with DGX Cloud integration. Together, these tools aim to speed the creation of digital twins, enhance AI robotics training, and enable scalable autonomous system deployment.

AI
Nvidia, Robotic

Reliance Jio Tops Global Telecom Charts with 488M Users & 5G Growth

Tech News & Insight
August 7, 2025
Hema Kadia

Reliance Jio has claimed the title of the world’s largest telecom operator with 488 million subscribers, including 191 million on its 5G network. Despite a 25% tariff hike, Jio’s 5G adoption continues to soar, making up 45% of its total wireless data traffic. Backed by investments in AI, 6G, and satellite internet—plus a partnership with SpaceX’s Starlink—Jio is expanding its reach beyond India to become a global tech leader.

5G, 6G, AI, Edge/MEC, FWA, Satellite & NTN
Airtel, Broadband, Jio, SpaceX, Starlink

Orange & OpenAI Bring Local Language AI to Africa

Tech News & Insight
August 6, 2025
Hema Kadia

Orange has expanded its partnership with OpenAI to localize AI models for underrepresented African languages like Wolof and Pulaar. These models will run on Orange’s secure, sovereign infrastructure, ensuring privacy and regulatory compliance. With applications in health, education, and digital equity, Orange’s Responsible AI strategy aims to make generative AI more accessible for Africa’s rural populations and especially for women, who face digital and language-based barriers.

AI
Chatgpt, LLM, OpenAI, Orange

Industry-Specific Private 5G Network Readiness Tools

Download Magazine

With Subscription

AI Pulse: Telecom’s New Frontier

Subscribe To Our Newsletter

Private Network Readiness Blueprint

Industry Specific Deep-Dive Assessment for Private Networks.

* Prices does not include tax

Partner Events

Executive Interviews

Private 5G in South Korea: Factory Deployment Insights and Use Cases

Challenging the Notion That LLMs Can’t Reason: A Case Study with Einstein’s Puzzle

Introduction to LLMs and the Reasoning Debate

The Initial Experiment: Einstein’s Puzzle

Tree of Thoughts Approach and Its Challenges

Logic Interpretation Issues

Bias Interference

A Solution Through Code Generation

Example of Clear Rule Translation

Implications and Conclusions: Rethinking the Role of LLMs

Oliver King-Smith, CEO and founder smartR AI

Recent Content

Whitepaper

Whitepaper

Article & Insights

Subscribe To Our Newsletter

Private Network Readiness Blueprint

Partner Events

Executive Interviews