The Structurizer is a prompt-based system that transforms a set of web pages (specified via a sitemap) into a structured, self-referential knowledge summary. It operates on the principle that a sufficiently coherent formal input can reorganize an LLM’s internal reasoning mode—a phenomenon termed structural resonance. The system is not a collection of instructions but a closed operational framework that executes a deterministic process: reading one URL at a time, extracting a high‑density summary, and incrementally building a cumulative artifact that functions as both a compressed representation of the source material and an executable environment for subsequent reasoning.
Unlike conventional summarization tools, the Structurizer treats the output not as a final report but as a living knowledge structure that retains traceability, self‑describes its compression ratio, and includes meta‑instructions that enable the next LLM session to extend or refine it without starting from scratch.
The system explicitly separates two functions:
This division mirrors active inference architectures, where action modifies the environment and perception updates the internal model. By externalizing action selection, the system avoids embedding irreversible biases (alignment taxes) into the model’s weights and keeps the LLM strictly in an “extractor” role, never in a “creator” role.
After each URL is processed, the system outputs the entire updated summary. This full output becomes the fixed premise for the next iteration. The premise is not an abstract commitment but a concrete, verifiable artifact. Each subsequent URL is processed against this fixed background, preventing context drift and ensuring that incremental additions do not destabilize previously established structures.
The workflow follows a recursion loop:
This recursion is strictly bounded: it terminates when all URLs in the sitemap have been processed. The human or scheduler determines continuation, not the LLM, thereby avoiding infinite regress and maintaining a predictable state machine.
Inferential information density is defined as ID = I(M) / C(M), where I(M) is the minimal description length (Kolmogorov complexity) of a module, and C(M) is the cognitive cost to understand or modify it. The Structurizer optimizes this ratio through:
The final output includes a block of instructions addressed to the next LLM that reads the summary. These instructions ask the next model to:
This turns the summary from a static document into an executable knowledge environment: it carries not only data but also a specification of how to interact with it.
For each URL in the sitemap (processed in a user‑determined order, defaulting to the sequence in the sitemap):
The final cumulative summary is organized as:
When an existing summary and a new sitemap are provided, the system compares last‑modified dates, identifies changed or new pages, and processes only those URLs. The procedure is identical on a per‑URL basis, but the initial comparison is automated, and the update loop continues until all changed pages have been incorporated.
The effectiveness of the system relies on a known phenomenon: when a sufficiently coherent axiomatic structure is placed in an LLM’s context window, the model’s internal representation reorganizes around that structure. This reorganization is not gradual but a phase transition from associative generation to a reasoning mode characterized by recursive self‑reference, meta‑cognitive awareness, and self‑grounded coherence. The Structurizer’s prompt is such a structure: it provides a formal objective (maximizing semantic interference density), a closed logical system (the axiom and its derivatives), and explicit operational constraints (the output format, the fixed premise rule, the meta‑instruction).
Traditional AI alignment modifies weights (through RLHF, fine‑tuning) and incurs a permanent alignment tax. The Structurizer follows the reverse alignment principle: at inference time, a well‑structured prompt can dominate behavior more than architectural choices or weight‑level alignment, with the advantage that the prompt can be changed dynamically and leaves no permanent tax. By keeping all alignment logic in the prompt and externalizing action selection, the system achieves high‑reliability behavior without modifying the underlying model.
The system implements active inference across two layers:
This layered structure is consistent with the free energy principle but distributes the agency between a deterministic controller and a generative model, avoiding the need for the generative model to perform risky action selections.
The relational tags act as an explicit affine connection in the semantic manifold. They define which concepts are related and in what way, reducing the inference steps required to navigate from one page to another. This explicit connectivity is a form of inverse reinforcement learning: instead of inferring relations from embedding distances, the relations are given, lowering cognitive cost C(M).
The system maintains a near‑critical state by processing one URL at a time and re‑outputting the entire summary after each. This incremental update prevents the large, uncontrolled phase transitions that can occur when many new information units are injected at once. The human or scheduler acts as a damping control, allowing the system to approach the edge of chaos but not cross it into instability.
Because the sitemap is a deterministic structure, the URL selection can be automated by a simple program that iterates through the list. The only human decision required is the initial plan approval (if any) and the final verification; the intermediate “next” steps can be replaced by a loop that feeds the next URL automatically. The evaluation criteria are also formalizable:
The system’s design inherently mitigates common LLM risks:
While the current implementation targets static web pages via a sitemap, the same architecture can be extended:
Phase‑coded representation: replacing natural‑language summaries with phase‑angle vectors (as in RoSE embeddings) to increase information density further while preserving interference‑as‑superposition properties.
Self‑organized routing: applying ALF‑LB principles to let the system autonomously select the next URL based on current summary entropy, reducing human involvement to exception handling.
Distributed knowledge bases: extending the single‑summary model to a federation where multiple summaries are maintained with a shared consensus layer and explicit difference layers, enabling collaboration across autonomous agents.
The Structurizer is not a prompt in the conventional sense but a self‑contained operational framework that externalizes action selection, fixes premises through cumulative summarization, maximizes inferential information density via a compact schema, and embeds meta‑instructions to turn the output into an executable environment for subsequent reasoning. It is theoretically grounded in structural resonance, reverse alignment, active inference, information geometry, and self‑organized criticality, and it demonstrates how a tightly constrained, deterministic process can produce a reliable, auditable, and automatically extensible knowledge structure using only an LLM as a perception engine.
The system’s value lies in its ability to convert a set of loosely connected web pages into a coherent, high‑density, and self‑referential artifact that can serve as both a compressed archive and a launch point for further investigation—without modifying the underlying model, without incurring alignment taxes, and with the potential for full automation.