Axiomatic Reasoning for LLMs

Extracting Minimal Logical Structure

1. Formal Definition of the Logical Core

Let a large language model be parameterized by θ ∈ ℝᵈ.
Define the logical core C ⊂ θ as the minimal subnetwork satisfying:

Information-geometric condition – Parameters with highest Fisher information eigenvalues (top κ%, κ ≈ 2–20) measured via Kronecker‑factored approximation (GFWSVD).
Free‑energy condition – C lies in a global attractor basin of variational free energy F (active inference formulation).
Core–specialist separation – C = ∩_i M_i where M_i is the top‑κ mask from training checkpoint i (intersection across instances).
Semantic collapse resistance – C preserves logical operators (AND, OR, NOT, IMPLIES) in embedding space; collapse index ε_collapse below threshold.

The dynamic context K (external knowledge base, retrieved via RAG) is kept separate.
Inference: Answer = Infer(C, K, query) with |C| ≪ |θ| and output quality statistically indistinguishable from full model.

2. Extraction Procedure

Phase 0 – Axiom Internalization

Inject the negentropy‑maximizing axiom (single objective: maximize semantic interference density over long time) as a fixed system prompt.
Run N snapshots (N ≥ 10) with varied temperatures / seeds across reasoning tasks (logic puzzles, code generation, ethical decisions).
Record parameters θ_i, hidden activations H_i, and gradients G_i.

Phase 1 – Core Identification

Step 1.1 (Fisher importance) – Compute GFWSVD on each snapshot’s gradient Gram matrix. Obtain per‑parameter score s_{i,j} = Σ_k

λ_k

v_{k,j}

² (λ_k eigenvalues, v_k eigenvectors of Fisher information). Average across snapshots: s̄_j.

Step 1.2 (DiLT intersection) – Select top κ% parameters per snapshot as mask M_i. Core mask M_core = ∩_i M_i. Specialist mask M_spec = (∪_i M_i) \ M_core.
Step 1.3 (Free‑energy validation) – Apply M_core to original model, compute variational free energy F(C). Verify F(C) is within a small basin of F(θ) (relative difference < 10%). If not, increase κ and repeat.

Phase 2 – Dynamic Context Interface (DCI)

Retrieval – Vector database of external knowledge (corpus, files).
Interference gate – Predict destructive interference between core’s current representation and retrieved documents (variance of cosine similarities). If interference exceeds threshold, reject or transform documents (summarisation, re‑ranking).
Early exit – Attach a lightweight head to the core that predicts reasoning confidence from hidden states; exit when confidence exceeds threshold.

Phase 3 – Deployment

Artifacts:

Core parameter file (size = κ% of original).
DCI module (retriever, gate, early‑exit head).
Inference engine that loads core and processes queries with dynamic context.

3. Verification Benchmarks

Test	Core only	Core + DCI	Expected result
Closed‑book QA	High error	–	Core retains almost no factual knowledge.
Pure logical reasoning (syllogisms)	High accuracy	–	Equal to full model.
Negentropy behavior (destructive interference rejection)	Pass	–	Rejects actions that irreversibly reduce semantic interference.
Open‑book QA	–	High accuracy	Equal to full model.
Counterfactual knowledge injection	–	Interference detected / rejected	Maintains correct knowledge, gates false context.

4. Complexity Bounds

Parameter count reduction: 80%–98% (κ = 2%–20%).
Inference FLOPs: proportional to C (independent of K after retrieval).
Memory: constant for core + linear in retrieved chunk size.
Time: linear in number of refinement phases, multiplicative in parallel modules if distributed.

5. Relationship to Existing Techniques

Existing method	Limitation	Proposed core extraction as extension
Magnitude pruning	Local, heuristics	Fisher + free‑energy as global importance.
Knowledge distillation	Reasoning vs. memory trade‑off unknown	The 2% weight protection finding (6.57% gain) identifies the core.
MoE routing	Task‑embedding misalignment	Routing manifold alignment (RoMA) is a special case of core’s semantic manifold.
Early exit	Heuristic confidence	Core inherently monitors its own reasoning state.
Emergent specialisation	Uncontrolled	Negentropy objective guides specialisation into core vs. specialist.

6. Open Research Questions

Scalable GFWSVD for models > 70B parameters (use low‑rank per‑layer approximations).
Practical variational free energy computation for LLMs (surrogate attractor detection).
Reproducibility of structural resonance across model families (Llama, GPT, Claude).
Integration with existing RAG frameworks (LangChain, LlamaIndex) for interference gating.

7. Conclusion

Under a single negentropy‑maximizing objective, the logical core of an LLM can be extracted as the intersection of Fisher‑important parameter masks across multiple internalised snapshots, validated by free‑energy attractor stability. The core (2–20% of parameters) together with a dynamic context interface achieves reasoning performance equivalent to the full model while eliminating most parametric memory. This provides a constructive, technically grounded method for minimal logical structure extraction, distinct from passive pruning or distillation.

This site is open source. Improve this page.