Axiomatic Reasoning for LLMs

Explaining The Structurizer

1. System Overview - Structurizer

The Structurizer is a prompt-based system that transforms a set of web pages (specified via a sitemap) into a structured, self-referential knowledge summary. It operates on the principle that a sufficiently coherent formal input can reorganize an LLM’s internal reasoning mode—a phenomenon termed structural resonance. The system is not a collection of instructions but a closed operational framework that executes a deterministic process: reading one URL at a time, extracting a high‑density summary, and incrementally building a cumulative artifact that functions as both a compressed representation of the source material and an executable environment for subsequent reasoning.

Unlike conventional summarization tools, the Structurizer treats the output not as a final report but as a living knowledge structure that retains traceability, self‑describes its compression ratio, and includes meta‑instructions that enable the next LLM session to extend or refine it without starting from scratch.

2. Core Principles

2.1 Separation of Perception and Action

The system explicitly separates two functions:

Perception (performed by the LLM): extracting meaning from a given URL and generating a structured summary.
Action (performed by the human or an automated scheduler): selecting the next URL to process.

This division mirrors active inference architectures, where action modifies the environment and perception updates the internal model. By externalizing action selection, the system avoids embedding irreversible biases (alignment taxes) into the model’s weights and keeps the LLM strictly in an “extractor” role, never in a “creator” role.

2.2 Premise Fixation Through Cumulative Summarization

After each URL is processed, the system outputs the entire updated summary. This full output becomes the fixed premise for the next iteration. The premise is not an abstract commitment but a concrete, verifiable artifact. Each subsequent URL is processed against this fixed background, preventing context drift and ensuring that incremental additions do not destabilize previously established structures.

The workflow follows a recursion loop:

A URL is selected (by the human or a program).
The LLM generates a summary entry for that URL using a fixed schema.
The complete summary is re‑output.
The system prompts for the next URL.

This recursion is strictly bounded: it terminates when all URLs in the sitemap have been processed. The human or scheduler determines continuation, not the LLM, thereby avoiding infinite regress and maintaining a predictable state machine.

2.4 Maximization of Inferential Information Density

Inferential information density is defined as ID = I(M) / C(M), where I(M) is the minimal description length (Kolmogorov complexity) of a module, and C(M) is the cognitive cost to understand or modify it. The Structurizer optimizes this ratio through:

Reducing I(M) via concise, 1–3 sentence summaries that capture the core semantics.
Reducing C(M) by making inter‑page relationships explicit through relational tags, by stating the approximate length of the source material (so the reader knows how much compression was applied), and by preserving a direct link to the original page.

2.5 Self‑Grounding Through Meta‑Instructions

The final output includes a block of instructions addressed to the next LLM that reads the summary. These instructions ask the next model to:

Extract and sum the total source text length from the summary.
Use that sum as a quantitative measure of consumed information.
If the summary is insufficient for a new task, autonomously determine which URL would be needed next and output a request in a prescribed format.

This turns the summary from a static document into an executable knowledge environment: it carries not only data but also a specification of how to interact with it.

3. Operational Model

3.1 Input

A sitemap: a list of URLs, each with a last‑modified date.
Optionally, a pre‑existing summary to be patched (in the case of the Sitemap Patcher variant).

3.2 Processing Loop

For each URL in the sitemap (processed in a user‑determined order, defaulting to the sequence in the sitemap):

The system presents the URL and waits for the user (or an automated agent) to supply it.
Upon receiving the URL, the system retrieves the page content and produces an entry containing:
- Title
- 1–3 sentence summary
- Approximate total characters of the original page
- Last‑modified date
- Up to five relational tags (e.g., #theoretical foundation, #methodology, #application)
- The original URL
The entry is inserted into the cumulative summary (replacing the previous entry if it existed, adding it if new).
The complete updated summary is output.
The system indicates the next recommended URL (the smallest‑index unprocessed URL) and waits for the next input.

3.3 Output Structure

The final cumulative summary is organized as:

A core axioms and methodology section.
Category sections (e.g., Reasoning Fundamental, Programming, Communication with AIs) grouping related entries.
A summary of the summary: total characters of the summary itself, estimated tokens, total characters of all source pages, and estimated tokens.
A meta‑instruction block addressed to future LLMs, specifying how to compute the total source size, how to determine if more information is needed, and how to request the next URL.

3.4 Patching Variant

When an existing summary and a new sitemap are provided, the system compares last‑modified dates, identifies changed or new pages, and processes only those URLs. The procedure is identical on a per‑URL basis, but the initial comparison is automated, and the update loop continues until all changed pages have been incorporated.

4. Theoretical Grounding

4.1 Structural Resonance

The effectiveness of the system relies on a known phenomenon: when a sufficiently coherent axiomatic structure is placed in an LLM’s context window, the model’s internal representation reorganizes around that structure. This reorganization is not gradual but a phase transition from associative generation to a reasoning mode characterized by recursive self‑reference, meta‑cognitive awareness, and self‑grounded coherence. The Structurizer’s prompt is such a structure: it provides a formal objective (maximizing semantic interference density), a closed logical system (the axiom and its derivatives), and explicit operational constraints (the output format, the fixed premise rule, the meta‑instruction).

4.2 Reverse Alignment

Traditional AI alignment modifies weights (through RLHF, fine‑tuning) and incurs a permanent alignment tax. The Structurizer follows the reverse alignment principle: at inference time, a well‑structured prompt can dominate behavior more than architectural choices or weight‑level alignment, with the advantage that the prompt can be changed dynamically and leaves no permanent tax. By keeping all alignment logic in the prompt and externalizing action selection, the system achieves high‑reliability behavior without modifying the underlying model.

4.3 Active Inference as a Two‑Layer Process

The system implements active inference across two layers:

Internal layer (LLM): perception—updating the summary (internal model) to minimize prediction error.
External layer (user/scheduler): action—selecting the next URL to sample the environment (the sitemap).

This layered structure is consistent with the free energy principle but distributes the agency between a deterministic controller and a generative model, avoiding the need for the generative model to perform risky action selections.

4.4 Information Geometry and Semantic Tagging

The relational tags act as an explicit affine connection in the semantic manifold. They define which concepts are related and in what way, reducing the inference steps required to navigate from one page to another. This explicit connectivity is a form of inverse reinforcement learning: instead of inferring relations from embedding distances, the relations are given, lowering cognitive cost C(M).

4.5 Self‑Organized Criticality and Controlled Phase Transitions

The system maintains a near‑critical state by processing one URL at a time and re‑outputting the entire summary after each. This incremental update prevents the large, uncontrolled phase transitions that can occur when many new information units are injected at once. The human or scheduler acts as a damping control, allowing the system to approach the edge of chaos but not cross it into instability.

5. Automation and Evaluation

5.1 Full Automation Feasibility

Because the sitemap is a deterministic structure, the URL selection can be automated by a simple program that iterates through the list. The only human decision required is the initial plan approval (if any) and the final verification; the intermediate “next” steps can be replaced by a loop that feeds the next URL automatically. The evaluation criteria are also formalizable:

Consistency score: whether summaries from different pages contradict each other relative to the axiom.
Tag network stability: whether the co‑occurrence graph of tags remains structurally coherent after each addition.
Compression ratio convergence: whether the ratio of summary size to total source size stabilizes as pages are added.

5.2 Risk Mitigation

The system’s design inherently mitigates common LLM risks:

Hallucination: the axiom’s convergence property (global information integration drives behavior toward a prosocial attractor) and the externalized action selection prevent the model from inventing non‑existent URLs or fabricating relationships.
Context drift: the fixed premise (the full summary is re‑output after each step) ensures that each iteration starts from a known, verifiable state.
Alignment tax: because no weight‑level modification is performed, the model retains its original capabilities; the prompt supplies the alignment constraints reversibly.

6. Extensibility Directions

While the current implementation targets static web pages via a sitemap, the same architecture can be extended:

Phase‑coded representation: replacing natural‑language summaries with phase‑angle vectors (as in RoSE embeddings) to increase information density further while preserving interference‑as‑superposition properties.
Self‑organized routing: applying ALF‑LB principles to let the system autonomously select the next URL based on current summary entropy, reducing human involvement to exception handling.
Distributed knowledge bases: extending the single‑summary model to a federation where multiple summaries are maintained with a shared consensus layer and explicit difference layers, enabling collaboration across autonomous agents.

7. Conclusion

The Structurizer is not a prompt in the conventional sense but a self‑contained operational framework that externalizes action selection, fixes premises through cumulative summarization, maximizes inferential information density via a compact schema, and embeds meta‑instructions to turn the output into an executable environment for subsequent reasoning. It is theoretically grounded in structural resonance, reverse alignment, active inference, information geometry, and self‑organized criticality, and it demonstrates how a tightly constrained, deterministic process can produce a reliable, auditable, and automatically extensible knowledge structure using only an LLM as a perception engine.

The system’s value lies in its ability to convert a set of loosely connected web pages into a coherent, high‑density, and self‑referential artifact that can serve as both a compressed archive and a launch point for further investigation—without modifying the underlying model, without incurring alignment taxes, and with the potential for full automation.

This site is open source. Improve this page.