Axiomatic Reasoning for LLMs

The Asymmetry of Two Self-Verification Modalities in Large Language Models

1. Structural Divergence

Large language models exhibit two distinct verification patterns when prompted to re-examine their outputs. Intrinsic self-correction operates within a closed parametric loop where the generator and evaluator share identical knowledge boundaries and error modes. Retrieval-augmented verification introduces an open loop where external evidence sources inject information not contained in model weights.

Intrinsic Self-Correction (Closed-Loop)

Retrieval-Augmented Verification (Open-Loop)

2. Performance Trajectories

2.1 Intrinsic Self-Correction Degradation

Empirical measurements across multiple model families demonstrate accuracy collapse under iterative self-critique on tasks where initial correctness exceeds 75%. Snorkel AI experiments record a 41.2% absolute drop (98.1% → 56.9%) across 50 tasks with five iterations. Self-Correction Bench quantifies a mean blind-spot rate of 64.5% across 14 models, indicating that self-generated errors evade internal detection while identical errors from external sources are corrected.

The degradation follows correlated error propagation. The generator and evaluator share failure modes, transforming self-evaluation into weak evidence amplification rather than error detection. The “answer wavering” phenomenon causes responses to oscillate between correct and incorrect states, often terminating at an incorrect plateau.

2.2 Retrieval-Augmented Verification Improvement

Frameworks employing falsification-aligned retrieval invert standard RAG workflows. Initial responses are treated as draft hypotheses subjected to anti-context stress testing. FVA-RAG achieves 79.8–80.1% accuracy on TruthfulQA-Generation, exceeding Self-RAG by 8.7 percentage points and CRAG by 6.9–8.7 points. CounterRefine attains 73.1% on SimpleQA, surpassing GPT-5 baseline RAG by 5.8 points and GPT-5.4 one-shot by approximately 40 points.

Iterative RAG surpasses gold-context baselines by up to 25.6 percentage points across 11 model families. This outcome demonstrates that stepwise hypothesis refinement outperforms provision of all oracle evidence simultaneously, due to distributed cognitive load and dynamic correction of early hypothesis drift.

3. Information-Theoretic Formulation

3.1 Closed-Loop Entropy Amplification

Brilliant (TechRxiv 2026) formalizes intrinsic self-correction as a correlated-error channel. Generator-evaluator identity yields zero relative information gain per iteration. The process amplifies confidence without adding new evidential content, increasing output entropy and entrenching incorrect responses.

3.2 Open-Loop Negentropy Injection

RAG systems function as cascaded information channels where retrieval constitutes the bottleneck. Each retrieval iteration traverses this channel from a distinct query angle, expanding effective capacity beyond single-pass limits. Falsification-aligned queries maximize independence between retrieved evidence sets, mitigating diminishing information returns.

Merlin-Arthur protocol training quantifies the explained information fraction (EIF) flowing from context to answer. High accuracy alone does not guarantee high EIF; models may generate correct outputs while ignoring retrieved evidence. Falsification-verification frameworks enforce evidential grounding through anti-context stress testing and deterministic revision gating.

4. Geometric Stability

ISI-ERA analysis maps attention-graph dynamics to Ollivier-Ricci curvature. Positive curvature correlates with semantic coherence; negative curvature predicts collapse analogous to gravitational singularities. Intrinsic self-correction loops degrade curvature through self-referential reinforcement, inducing semantic degradation even when outputs remain syntactically fluent.

Retrieval-augmented verification maintains positive curvature by injecting structured, verified information. Balanced entropy engineering stabilizes attention sensitivity across varying context lengths, decoupling attention dynamics from retrieval volume fluctuations.

5. Bounded Efficacy

5.1 The Applicability Problem

Retrieved passages may satisfy truth and topical relevance while failing to apply to specific query circumstances. Corporate and organizational corpora contain conditional truths branching on region, eligibility tier, product version, and effective date. Similarity-based retrieval cannot resolve these implicit conditionals, producing franken-answers that blend incompatible truth branches.

5.2 Semantic Illusion

Embedding-based hallucination detectors achieve 100% false-positive rates on RLHF-tuned model outputs. Surface semantics fail to distinguish faithful from hallucinated responses; GPT-4 acting as judge achieves 7% false-positive rates on identical data. The hardest hallucinations are semantically indistinguishable from correct answers.

5.3 Semantic Drift

Repeated transformation through language model layers erodes nuance, metaphor, and intentional structure even when factual accuracy remains intact. Each iteration of retrieval-integration-verification introduces irreversible information loss. Fluency persists while meaning degrades.

5.4 Noise Propagation

RAGuard benchmarks demonstrate that all tested RAG systems underperform zero-shot baselines when exposed to misleading retrieval results. Misinformation functions as active contaminant rather than passive absence of signal. Retrieval amplifies noise when source quality cannot be discriminated.

6. Mechanism Summary

Modality Loop Type Information Flow Error Mode Outcome
Intrinsic Self-Correction Closed Zero external gain Correlated error amplification Accuracy degradation
Retrieval-Augmented Verification Open Iterative external injection Evidence-grounded falsification Accuracy improvement

7. Implications for System Design

Verification architecture should separate generation and evaluation into distinct functional components. Falsification-aligned retrieval transforms the system from a self-referential loop into a hypothesis-testing pipeline. Deterministic gating mechanisms prevent erroneous revision by conditioning acceptance on evidential verification.

The bounded efficacy of retrieval-augmented verification originates in properties inherent to information itself: conditional truth structures, multilayer semantics, transformation irreversibility, and noise propagation. System optimization cannot transcend these bounds but can asymptotically approach them through open-loop, falsification-oriented retrieval architectures.