Private Network Check Readiness - TeckNexus Solutions

Home » Challenging the Notion That LLMs Can’t Reason: A Case Study with Einstein’s Puzzle

Challenging the Notion That LLMs Can’t Reason: A Case Study with Einstein’s Puzzle

When Apple declared that LLMs can't reason, they forgot one crucial detail: a hammer isn't meant to turn screws. In our groundbreaking study of Einstein's classic logic puzzle, we discovered something fascinating. While language models initially stumbled with pure reasoning - making amusing claims like "Plumbers don't drive Porsches" - they excelled at an unexpected task.

By Oliver King-Smith, CEO and founder smartR AI
Last Updated: November 10, 2024

Introduction to LLMs and the Reasoning Debate

A recent Apple publication argued that Large Language Models (LLMs) cannot effectively reason. While there is some merit to this claim regarding out-of-the-box performance, this article demonstrates that with proper application, LLMs can indeed solve complex reasoning problems.

The Initial Experiment: Einstein’s Puzzle

We set out to test LLM reasoning capabilities using Einstein’s puzzle, a complex logic problem involving 5 houses with different characteristics and 15 clues to determine who owns a fish. Our initial tests with leading LLMs showed mixed results:

OpenAI’s model correctly guessed the answer, but without clear reasoning
Claude provided an incorrect answer
When we modified the puzzle with new elements (cars, hobbies, drinks, colors, and jobs), both models failed significantly

Tree of Thoughts Approach and Its Challenges

We implemented our Tree of Thoughts approach, where the model would:

Make guesses about house arrangements
Use critics to evaluate rule violations
Feed this information back for the next round

However, this revealed several interesting failures in reasoning:

Logic Interpretation Issues

The critics often struggled with basic logical concepts. For example, when evaluating the rule “The Plumber lives next to the Pink house,” we received this confused response:

“The Plumber lives in House 2, which is also the Pink house. Since the Plumber lives in the Pink house, it means that the Plumber lives next to the Pink house, which is House 1 (Orange).”

Bias Interference

The models sometimes inserted unfounded biases into their reasoning. For instance:

“The Orange house cannot be in House 1 because the Plumber lives there and the Plumber does not drive a Porsche.”

The models also made assumptions about what music Porsche drivers would listen to, demonstrating how internal biases can interfere with pure logical reasoning.

A Solution Through Code Generation

While direct reasoning showed limitations, we discovered that LLMs could excel when used as code generators. We asked SCOTi to write MiniZinc code to solve the puzzle, resulting in a well-formed constraint programming solution. The key advantages of this approach were:

Each rule could be cleanly translated into code statements
The resulting code was highly readable
MiniZinc could solve the puzzle efficiently

Example of Clear Rule Translation

The MiniZinc code demonstrated elegant translation of puzzle rules into constraints. For instance:

% Statement 11: The man who enjoys Music lives next to the man who drives Porsche
% Note / means AND in minizinc
constraint exists(i,j in 1..5)(abs(i-j) == 1 / hobbies[i] = Music / cars[j] = Porsche);

If you would like to get the full MiniZinc code, please contact me.

Implications and Conclusions: Rethinking the Role of LLMs

This experiment reveals several important insights about LLM capabilities:

Direct reasoning with complex logic can be challenging for LLMs
Simple rule application works well, but performance degrades when multiple steps of inference are required
LLMs excel when used as agents to generate code for solving logical problems
The combination of LLM code generation and traditional constraint solving tools creates powerful solutions

The key takeaway is that while LLMs may struggle with certain types of direct reasoning, they can be incredibly effective when properly applied as components in a larger system. This represents a significant advancement in software development capabilities, demonstrating how LLMs can be transformative when used strategically rather than as standalone reasoning engines.

This study reinforces the view that LLMs are best understood as transformational software components rather than complete reasoning systems. Their impact on software development and problem-solving will continue to evolve as we better understand how to leverage their strengths while working around their limitations.

AI
Apple, Chatgpt, LLM, OpenAI

Oliver King-Smith, CEO and founder smartR AI

Oliver King-Smith is CEO of smartR AI, a company which a company which facilitates and empowers organizations to extract real value from their data in an ethical, responsible, and sustainable manner using cutting edge AI technology.

All Posts

Deutsche Telekom AI Phone: Bundled AI for Everyone

Tech News & Insight
August 15, 2025
Hema K

Deutsche Telekom is using hardware, pricing, and partnerships to make AI a mainstream feature set across mass-market smartphones and tablets. Deutsche Telekom introduced the T Phone 3 and T Tablet 2, branded as the AI-phone and AI-tablet, with Perplexity as the embedded assistant and a dedicated magenta button for instant access. In Germany, the AI-phone starts at 149 and the AI-tablet at 199, or one euro each when bundled with a tariff, positioning AI features at entry-level price points and shifting value to services and connectivity. The bundle includes an 18-month Perplexity Pro subscription in addition to the embedded assistant, plus three months of Picsart Pro with monthly credits, which lowers the barrier to adopting AI-powered creation and search.

AI, Devices
Deutsche Telekom, Devices, GenAI, Partnerships, Perplexity

Zayo Amend-and-Extend to 2030 for AI Network Expansion

Tech News & Insight
August 15, 2025
Hema K

Zayo has secured creditor backing to push major debt maturities to 2030, creating headroom to fund network expansion as AI-driven demand accelerates. Zayo entered into a transaction support agreement dated July 22, 2025, with holders of more than 95% of its term loans, secured notes, and unsecured notes to amend terms and extend maturities to 2030. By extending maturities, Zayo lowers refinancing risk in a higher-for-longer rate environment and preserves cash for growth capex. The move aligns with its pending $4.25 billion acquisition of Crown Castle Fibers assets and follows years of heavy investment in fiber infrastructure.

AI
Data Center, Fiber, Investment, Zayo

Perplexity’s $34.5B Chrome Bid

Tech News & Insight
August 13, 2025
Hema K

An unsolicited offer from Perplexity to acquire Googles Chrome raises immediate questions about antitrust remedies, AI distribution, and who controls the internets primary access point. Perplexity has proposed a $34.5 billion cash acquisition of Chrome and says backers are lined up to fund the deal despite the startups significantly smaller balance sheet and an estimated $18 billion valuation in recent fundraising. The bid includes commitments to keep Chromium open source, invest an additional $3 billion in the codebase, and preserve current user defaults including leaving Google as the default search engine. The timing aligns with a U.S. Department of Justice push for structural remedies after a court found Google maintained an illegal search monopoly, with a Chrome divestiture floated as a central remedy.

AI, Edge/MEC, Monetization, Security
Comet, Google, GPU, Partnerships, Perplexity, Policy, Startups

AI Traffic Growth: Ciena Report on Optical Network Readiness

Tech News & Insight
August 13, 2025
Hema K

A new Ciena and Heavy Reading study signals that AI will become a primary source of metro and long-haul traffic within three years while most optical networks remain only partially prepared. AI training and inference are shifting from contained data center domains to distributed, edge-to-core workflows that stress transport capacity, latency, and automation end-to-end. Expectations are even higher for long-haul: 52% see AI surpassing 30% of traffic and 29% expect AI to account for more than half. Yet only 16% of respondents rate their optical networks as very ready for AI workloads, underscoring an execution gap that will shape capex priorities, service roadmaps, and partnership models through 2027.

AI, Assurance, Automation, Sustainability
AWS, Ciena, Cisco, Data Center, Fiber, GenAI, Investment, Microsoft, Nokia, Optical Network, Spectrum

Korean Telecoms Launch 300B-Won AI & Semiconductor Fund

Tech News & Insight
August 13, 2025
Hema K

South Korea’s government and its three national carriers are aligning fresh capital to speed AI and semiconductor competitiveness and to anchor a private-led innovation flywheel. SK Telecom, KT, and LG Uplus will seed a new pool exceeding 300 billion won (about $219 million) via the Korea IT Fund (KIF) to back core and foundational AI, AI transformation (AX), and commercialization in ICT. KIF, formed in 2002 by the carriers, will receive 150 billion won in new commitments, matched by at least an equal amount from external fund managers. The platforms lifespan has been extended to 2040 to sustain long-cycle bets.

5G, AI, Assurance, Automation, Edge/MEC, Open RAN, RAN, Semiconductor
3GPP, Data Center, GenAI, Investment, KT, LG Uplus, SKT, Startups
Manufacturing, Telecom

NTT DATA and Google Cloud: Agentic AI & Sovereign Cloud

Tech News & Insight
August 13, 2025
Hema K

NTT DATA and Google Cloud expanded their global partnership to speed the adoption of agentic AI and cloud-native modernization across regulated and dataintensive industries. The push emphasizes sovereign cloud options using Google Distributed Cloud, with both airgapped and connected deployments to meet data residency and regulatory needs without stalling innovation. The partners plan to build industry-specific agentic AI solutions on Google Agent space and Gemini models, underpinned by secure data clean rooms and modernized data platforms. NTT DATA is standing up a dedicated Google Cloud Business Group with thousands of engineers and aims to certify 5,000 practitioners to accelerate delivery, migrations, and managed services.

AI, API, Automation, Edge/MEC, Security
Cybersecurity, DevOps, Google, NTT, Policy
HealthCare, Manufacturing, Public sector, Retail

Industry-Specific Private 5G Network Readiness Tools

Download Magazine

With Subscription

AI Pulse: Telecom’s New Frontier

Subscribe To Our Newsletter

Private Network Readiness Blueprint

Industry Specific Deep-Dive Assessment for Private Networks.

* Prices does not include tax

Partner Events

Executive Interviews

Private 5G in South Korea: Factory Deployment Insights and Use Cases

Challenging the Notion That LLMs Can’t Reason: A Case Study with Einstein’s Puzzle

Introduction to LLMs and the Reasoning Debate

The Initial Experiment: Einstein’s Puzzle

Tree of Thoughts Approach and Its Challenges

Logic Interpretation Issues

Bias Interference

A Solution Through Code Generation

Example of Clear Rule Translation

Implications and Conclusions: Rethinking the Role of LLMs

Oliver King-Smith, CEO and founder smartR AI

Recent Content

Whitepaper

Whitepaper

Article & Insights

Subscribe To Our Newsletter

Private Network Readiness Blueprint

Partner Events

Executive Interviews