Challenging the Notion That LLMs Can’t Reason: A Case Study with Einstein’s Puzzle

When Apple declared that LLMs can't reason, they forgot one crucial detail: a hammer isn't meant to turn screws. In our groundbreaking study of Einstein's classic logic puzzle, we discovered something fascinating. While language models initially stumbled with pure reasoning - making amusing claims like "Plumbers don't drive Porsches" - they excelled at an unexpected task.
Challenging the Notion That LLMs Can’t Reason: A Case Study with Einstein’s Puzzle
Image Credit: SmartR AI

Introduction to LLMs and the Reasoning Debate


A recent Apple publication argued that Large Language Models (LLMs) cannot effectively reason. While there is some merit to this claim regarding out-of-the-box performance, this article demonstrates that with proper application, LLMs can indeed solve complex reasoning problems.

The Initial Experiment: Einstein’s Puzzle

We set out to test LLM reasoning capabilities using Einstein’s puzzle, a complex logic problem involving 5 houses with different characteristics and 15 clues to determine who owns a fish. Our initial tests with leading LLMs showed mixed results:

  • OpenAI’s model correctly guessed the answer, but without clear reasoning
  • Claude provided an incorrect answer
  • When we modified the puzzle with new elements (cars, hobbies, drinks, colors, and jobs), both models failed significantly

Tree of Thoughts Approach and Its Challenges

We implemented our Tree of Thoughts approach, where the model would:

  1. Make guesses about house arrangements
  2. Use critics to evaluate rule violations
  3. Feed this information back for the next round

However, this revealed several interesting failures in reasoning:

Logic Interpretation Issues

The critics often struggled with basic logical concepts. For example, when evaluating the rule “The Plumber lives next to the Pink house,” we received this confused response:

“The Plumber lives in House 2, which is also the Pink house. Since the Plumber lives in the Pink house, it means that the Plumber lives next to the Pink house, which is House 1 (Orange).”

Bias Interference

The models sometimes inserted unfounded biases into their reasoning. For instance:

“The Orange house cannot be in House 1 because the Plumber lives there and the Plumber does not drive a Porsche.”

The models also made assumptions about what music Porsche drivers would listen to, demonstrating how internal biases can interfere with pure logical reasoning.

A Solution Through Code Generation

While direct reasoning showed limitations, we discovered that LLMs could excel when used as code generators. We asked SCOTi to write MiniZinc code to solve the puzzle, resulting in a well-formed constraint programming solution. The key advantages of this approach were:

  1. Each rule could be cleanly translated into code statements
  2. The resulting code was highly readable
  3. MiniZinc could solve the puzzle efficiently

Example of Clear Rule Translation

The MiniZinc code demonstrated elegant translation of puzzle rules into constraints. For instance:

% Statement 11: The man who enjoys Music lives next to the man who drives Porsche
% Note / means AND in minizinc
constraint exists(i,j in 1..5)(abs(i-j) == 1 / hobbies[i] = Music / cars[j] = Porsche);

If you would like to get the full MiniZinc code, please contact me.

Implications and Conclusions: Rethinking the Role of LLMs

This experiment reveals several important insights about LLM capabilities:

  1. Direct reasoning with complex logic can be challenging for LLMs
  2. Simple rule application works well, but performance degrades when multiple steps of inference are required
  3. LLMs excel when used as agents to generate code for solving logical problems
  4. The combination of LLM code generation and traditional constraint solving tools creates powerful solutions

The key takeaway is that while LLMs may struggle with certain types of direct reasoning, they can be incredibly effective when properly applied as components in a larger system. This represents a significant advancement in software development capabilities, demonstrating how LLMs can be transformative when used strategically rather than as standalone reasoning engines.

This study reinforces the view that LLMs are best understood as transformational software components rather than complete reasoning systems. Their impact on software development and problem-solving will continue to evolve as we better understand how to leverage their strengths while working around their limitations.


Recent Content

TeckNexus is proud to announce the winners of the 2024 Private Networks Awards, celebrating outstanding achievements in private 5G, LTE, and CBRS innovations. This prestigious program honors companies, solutions, and collaborations that have transformed connectivity and redefined industry standards in sectors such as manufacturing, healthcare, smart cities, and public safety. The winners showcase how advanced private networks and strategic partnerships address complex challenges, drive innovation, and promote sustainable growth.

Award Category: Excellence in Private 5G/LTE Networks

Winner: Nokia


Nokia has been recognized with the TeckNexus 2024 Award for “Excellence in Private 5G/LTE Networks” for its transformative solutions that drive industrial digital transformation. Utilizing advanced technologies such as Nokia Digital Automation Cloud (DAC) and Modular Private Wireless (MPW), Nokia delivers secure, scalable, and high-performance connectivity tailored for Industry 4.0 applications. By addressing complex operational challenges through reliable, low-latency connectivity, AI-driven automation, and robust data security, Nokia empowers enterprises to optimize efficiency, enhance automation, and foster sustainability. With deployments across over 500 enterprise customers and 1,500 mission-critical networks, Nokia’s innovative private wireless solutions are setting new standards for connectivity, operational excellence, and industrial growth worldwide.

Award Category: Private Network Excellence in Generative AI Integration

Winner: Southern California Edison (SCE) & NVIDIA


Southern California Edison (SCE), in collaboration with NVIDIA, has been honored with the TeckNexus 2024 Award for “Excellence in Private Network AI and Generative AI Integration” for their transformative work in modernizing network operations through advanced AI and predictive analytics. Their initiative, Project Orca, exemplifies the power of AI-driven innovation, enhancing predictive capabilities, operational efficiency, and the reliability of critical infrastructure. This collaboration highlights how SCE and NVIDIA’s AI solutions redefine network operations, elevating performance and setting new standards for AI integration in private networks.

Award Category: Private Network Excellence in Network Assurance

Winner: Anritsu

Partner: SmartViser, Major European Airline


Anritsu has been recognized with the TeckNexus 2024 Award for “Private Network Excellence in Network Assurance” for its outstanding achievements in ensuring private 5G/LTE network performance and reliability. This award highlights Anritsu’s comprehensive approach to network monitoring, business-centric KPIs, and performance analytics within mission-critical environments such as international airports. By leveraging advanced real-time monitoring, automated testing technologies, and collaborative solutions with SmartViser, Anritsu has set a new benchmark for maintaining optimal network efficiency, user satisfaction, and high-performance connectivity in complex private network scenarios.

Award Category: Private Network Excellence in Innovation

Winner: Southern California Edison (SCE) & NVIDIA


Southern California Edison (SCE) and NVIDIA have received the 2024 TeckNexus “Private Network Excellence in Networks Innovation” award for their transformative Project Orca. This initiative integrates AI-driven insights with a private 5G/LTE network, transforming utility network management, enhancing reliability, operational efficiency, and clean energy integration. By harnessing predictive AI capabilities and secure, scalable connectivity, SCE and NVIDIA are setting a new benchmark for modernizing utility operations, driving sustainability, and optimizing network performance to address the evolving needs of the industry.

Award Category: Private Network Excellence in Innovation

Winner: Fiducia Sports AI


Fiducia Sports AI has been recognized with the TeckNexus 2024 Award for “Private Network Excellence in Innovation” for transforming fan engagement in the sports and entertainment industry. By leveraging artificial intelligence (AI), augmented reality (AR), and the power of public and private 5G networks, Fiducia’s innovative platform delivers real-time player stats, immersive AR experiences, and interactive content. This seamless and personalized connection enhances fan interaction with sports events across diverse platforms, redefining the fan experience and transforming how audiences engage with sports content, regardless of their location.
Whitepaper
Explore how Generative AI is transforming telecom infrastructure by solving critical industry challenges like massive data management, network optimization, and personalized customer experiences. This whitepaper offers in-depth insights into AI and Gen AI's role in boosting operational efficiency while ensuring security and regulatory compliance. Telecom operators can harness these AI-driven...
Supermicro and Nvidia Logo
Whitepaper
The whitepaper, "How Is Generative AI Optimizing Operational Efficiency and Assurance," provides an in-depth exploration of how Generative AI is transforming the telecom industry. It highlights how AI-driven solutions enhance customer support, optimize network performance, and drive personalized marketing strategies. Additionally, the whitepaper addresses the challenges of integrating AI into...
RADCOM Logo
Article & Insights
This article explores the deployment of 5G NR Transparent Non-Terrestrial Networks (NTNs), detailing the architecture's advantages and challenges. It highlights how this "bent-pipe" NTN approach integrates ground-based gNodeB components with NGSO satellite constellations to expand global connectivity. Key challenges like moving beam management, interference mitigation, and latency are discussed, underscoring...

Subscribe To Our Newsletter

Scroll to Top