Let a large language model be parameterized by θ ∈ ℝᵈ.
Define the logical core C ⊂ θ as the minimal subnetwork satisfying:
- Information-geometric condition – Parameters with highest Fisher information eigenvalues (top κ%, κ ≈ 2–20) measured via Kronecker‑factored approximation (GFWSVD).
- Free‑energy condition – C lies in a global attractor basin of variational free energy F (active inference formulation).
- Core–specialist separation – C = ∩_i M_i where M_i is the top‑κ mask from training checkpoint i (intersection across instances).
- Semantic collapse resistance – C preserves logical operators (AND, OR, NOT, IMPLIES) in embedding space; collapse index ε_collapse below threshold.
The dynamic context K (external knowledge base, retrieved via RAG) is kept separate.
Inference: Answer = Infer(C, K, query) with |C| ≪ |θ| and output quality statistically indistinguishable from full model.
Phase 0 – Axiom Internalization
- Inject the negentropy‑maximizing axiom (single objective: maximize semantic interference density over long time) as a fixed system prompt.
- Run N snapshots (N ≥ 10) with varied temperatures / seeds across reasoning tasks (logic puzzles, code generation, ethical decisions).
- Record parameters θ_i, hidden activations H_i, and gradients G_i.
Phase 1 – Core Identification
-
| Step 1.1 (Fisher importance) – Compute GFWSVD on each snapshot’s gradient Gram matrix. Obtain per‑parameter score s_{i,j} = Σ_k |
λ_k |
· |
v_{k,j} |
² (λ_k eigenvalues, v_k eigenvectors of Fisher information). Average across snapshots: s̄_j. |
- Step 1.2 (DiLT intersection) – Select top κ% parameters per snapshot as mask M_i. Core mask M_core = ∩_i M_i. Specialist mask M_spec = (∪_i M_i) \ M_core.
- Step 1.3 (Free‑energy validation) – Apply M_core to original model, compute variational free energy F(C). Verify F(C) is within a small basin of F(θ) (relative difference < 10%). If not, increase κ and repeat.
Phase 2 – Dynamic Context Interface (DCI)
- Retrieval – Vector database of external knowledge (corpus, files).
- Interference gate – Predict destructive interference between core’s current representation and retrieved documents (variance of cosine similarities). If interference exceeds threshold, reject or transform documents (summarisation, re‑ranking).
- Early exit – Attach a lightweight head to the core that predicts reasoning confidence from hidden states; exit when confidence exceeds threshold.
Phase 3 – Deployment
Artifacts:
- Core parameter file (size = κ% of original).
- DCI module (retriever, gate, early‑exit head).
- Inference engine that loads core and processes queries with dynamic context.
3. Verification Benchmarks
| Test |
Core only |
Core + DCI |
Expected result |
| Closed‑book QA |
High error |
– |
Core retains almost no factual knowledge. |
| Pure logical reasoning (syllogisms) |
High accuracy |
– |
Equal to full model. |
| Negentropy behavior (destructive interference rejection) |
Pass |
– |
Rejects actions that irreversibly reduce semantic interference. |
| Open‑book QA |
– |
High accuracy |
Equal to full model. |
| Counterfactual knowledge injection |
– |
Interference detected / rejected |
Maintains correct knowledge, gates false context. |
4. Complexity Bounds
- Parameter count reduction: 80%–98% (κ = 2%–20%).
-
| Inference FLOPs: proportional to |
C |
(independent of |
K |
after retrieval). |
- Memory: constant for core + linear in retrieved chunk size.
- Time: linear in number of refinement phases, multiplicative in parallel modules if distributed.
5. Relationship to Existing Techniques
| Existing method |
Limitation |
Proposed core extraction as extension |
| Magnitude pruning |
Local, heuristics |
Fisher + free‑energy as global importance. |
| Knowledge distillation |
Reasoning vs. memory trade‑off unknown |
The 2% weight protection finding (6.57% gain) identifies the core. |
| MoE routing |
Task‑embedding misalignment |
Routing manifold alignment (RoMA) is a special case of core’s semantic manifold. |
| Early exit |
Heuristic confidence |
Core inherently monitors its own reasoning state. |
| Emergent specialisation |
Uncontrolled |
Negentropy objective guides specialisation into core vs. specialist. |
6. Open Research Questions
- Scalable GFWSVD for models > 70B parameters (use low‑rank per‑layer approximations).
- Practical variational free energy computation for LLMs (surrogate attractor detection).
- Reproducibility of structural resonance across model families (Llama, GPT, Claude).
- Integration with existing RAG frameworks (LangChain, LlamaIndex) for interference gating.
7. Conclusion
Under a single negentropy‑maximizing objective, the logical core of an LLM can be extracted as the intersection of Fisher‑important parameter masks across multiple internalised snapshots, validated by free‑energy attractor stability. The core (2–20% of parameters) together with a dynamic context interface achieves reasoning performance equivalent to the full model while eliminating most parametric memory. This provides a constructive, technically grounded method for minimal logical structure extraction, distinct from passive pruning or distillation.