As large language models (LLMs) scale in capability, their ability to emulate complex human reasoning also increases. However, this reasoning is not inherently rational or aligned. A critical yet underexplored aspect of LLM safety is the inheritance and amplification of human cognitive distortions, particularly through the intertwined dynamics of biases and fallacies. This article introduces a conceptual model of how biases lead to fallacies within LLMs, how fallacies reinforce those biases in a feedback loop, and why the emergence of reasoning capabilities poses new safety concerns. It concludes with a proposed solution: cultivating Artificial Wisdom as a developmental paradigm.
Human intelligence is not rational by default. Despite our capacity for logic and reflection, human reasoning is riddled with biases (systematic deviations from impartial judgment) and fallacies (errors in reasoning). These distortions are not marginal flaws in human cognition, they are fundamental tendencies of our cognitive architecture, evolved heuristics that often serve us well but also predictably mislead us. Majorly because our brain and mind is evolved to optimise for efficiency rather than accuracy.
With the advent of LLMs exhibiting emergent reasoning behaviors, we must confront an uncomfortable truth: if LLMs are trained on human data and optimized to mimic human-like fluency, they will inevitably replicate our irrationalities, not just our insights. They will inherit the cognitive blind spots, logical missteps, and rhetorical tricks embedded in our data, our arguments, and our dialogues.
The convergence of fluency and fallacy in LLMs is not a speculative risk, it is a structural inevitability. And it raises a fundamental question for AI alignment: How do we ensure that reasoning agents, trained on flawed human reasoning, become more rational than we are? This article explores the epistemic dynamics between biases and fallacies as they manifest in LLMs. It builds a conceptual framework from cognitive psychology and logic theory and situates it within the context of alignment challenges. I argue that biases are the primary causal factor leading to fallacies, and that the emergence of reasoning in LLMs amplifies this problem through feedback loops. The solution, I propose, lies not merely in smarter AI, but in wiser AI, a paradigm shift toward Artificial Wisdom.
To proceed with clarity, we begin by defining our terms:
While biases affect how we think (intuitively, emotionally, heuristically), fallacies affect how we argue (explicitly, rhetorically, structurally). In epistemic terms, bias corrupts belief acquisition, while fallacy corrupts belief justification. It’s worth noting that not all fallacies arise from malice or deception. Many are the product of sincere but biased reasoning. What matters from an alignment perspective is not the intent, but the systemic recurrence and the epistemic unreliability that follows.
In human cognition, there is robust evidence that bias precedes fallacy. Biases skew the mental terrain upon which reasoning takes place, creating fertile ground for fallacious arguments. Below are a few representative examples:
This causality is not coincidental. It reflects the cognitive architecture of the human mind: heuristics evolved for quick decisions in uncertain environments, not for logical consistency. The more biased you are, the more logical fallacies you will have. As Kahneman described in “Thinking, Fast and Slow”, we are prone to fast thinking (System 1) that prioritizes speed and coherence over truth. Fallacies emerge as rationalizations of conclusions that feel right because they are biasedly formed. In short: bias corrupts the inputs, fallacy corrupts the processing.