Axiomatic Reasoning for LLMs

Think Later or Think First - How Does It Matter in LLM Reasoning Chains

1. Single-Turn Output Order Effects

1.1 Pre-thinking vs. Post-thinking Modes

Post-thinking (Answer-First): Internal answer resolution precedes explicit CoT generation. Observed in simple subproblems.
Pre-thinking (CoT-First): Answer constructed progressively during CoT emission. Occurs in complex multi-step tasks.
Empirical delta: Relative accuracy variance up to 67% between forced order conditions.

1.2 Task-Dependent Performance Shifts

1.3 Faithfulness Taxonomy

Genuine: Individual steps causally necessary for answer.
Scaffolding: CoT boosts accuracy but steps are substitutable.
Decoration: No accuracy contribution; steps are post-hoc rationalization.

2. Verification Timing (CoV)

2.1 Post-hoc Verification (CoVe Pattern)

Sequence: Draft → Plan verification questions → Execute independent checks → Revise.
Key mechanism: Factored verification (hiding initial draft during checks) reduces self-consistency bias.
Limitation: Snowballing errors accumulate before correction.

2.2 Real-time / Step-wise Verification

Ever: Rectifies hallucinations during generation, preventing error amplification.
VeriCoT: Symbolic logic check per reasoning step; detects ungrounded inferences.
Trade-off: Higher compute latency vs. reduced error propagation.

2.3 Self-Verification Constraints

Metacognitive hallucination: Verification steps may reinforce initial biases.
Chain disloyalty: Flawed reasoning paths resist correction even when errors are identified.

3. Multi-Turn Amplification Dynamics

3.1 State Loss in Sequential Turns

API-level gap: OpenAI Chat Completion API does not relay prior reasoning content; Anthropic Messages API supports interleaved thinking.
Observed effect: Reliability drops from 95% (single-turn) to 45% (multi-turn); unreliability increases by 112%.

3.2 Positive vs. Negative Amplification

3.3 Mitigation Strategies

State-Update Prompting: Selective history reconstruction reduces token usage by 59.4% and inference time by 73.1%.
Interleaved Thinking: Alternates reasoning chunks with tool calls, preserving cognitive context across turns.
Structured CoT (SCoT): Finite-state decomposition improves grounding fidelity by up to 16.8%.

4. Structural Prompt Design Implications

4.1 Three-Layer Information Architecture

Intra-prompt order: Context-first arrangement leverages causal attention (14% accuracy gain vs. reversed order).
Cross-turn state representation: Hybrid of compressed key findings (concept-level) and detailed reasoning steps (token-level).
Intervention timing: Step-internal falsification checks plus turn-boundary user approval gates.

Falsification depth increases with turn index: Later turns exhibit multi-angle critique built on accumulated knowledge.
Self-referential constraint: Fixed output order structurally limits dynamic reordering; external direction modification bypasses this via meta-level intervention.

5. Summary of Differential Outcomes

Dimension	Think First (CoT Pre-Answer)	Think Later (Answer-First)
Complex reasoning	Higher accuracy, longer latency	Premature commitment risk
Simple / constrained tasks	Attention dilution, constraint neglect	Efficient, lower hallucination
Multi-turn robustness	State drift unless explicitly preserved	Requires post-hoc verification or interleaved state
Verification integration	Sequential (VeriCoT) or post-hoc (CoVe)	Post-hoc correction loop

Core finding: Output order matters in both single and multi-turn contexts, with effects modulated by task complexity, verification independence, and state propagation fidelity. Multi-turn settings amplify initial order choices non-linearly through error accumulation or structured knowledge reuse.

This site is open source. Improve this page.