Axiomatic Reasoning for LLMs

An Orchestration Strategy

1. Core Abstraction

A three‑layer architecture separates what a system must achieve from how it is realized, then distributes execution across parallel modules whose coordination overhead grows logarithmically while total compute scales multiplicatively.

Layer 0 – Intent: High‑level objective expressed in natural language or constrained form.
Layer 1 – Structural Specification: Machine‑interpretable contract (schemas, invariants, dependency rules) that defines interfaces and allowed interactions.
Layer 2 – Execution: Parallel modules that consume the specification and produce conforming artifacts (code, data, decisions).

The specification acts as a fixed premise—once validated, it becomes immutable for the current phase, enabling local reasoning and preventing context drift.

2. Decomposition Pattern

Intent is decomposed via a hierarchical task network (HTN) pattern with four primitive forms:

Pattern	Behavior	Parallelizability
Sequential	Ordered subtasks, output feeds next	None
Parallel	Independent subtasks, results aggregated	Full
Conditional	Branching based on intermediate state	Partial
Iterative	Repeated application until convergence	Loop‑level only

The decomposition engine selects patterns based on task dependency graph analysis. For any task, the decomposition is expressed as:

specification = { id, version, input_schema, output_schema, invariants: [condition_1, …, condition_n], decomposition: { pattern, subtasks, preconditions, postconditions }, dependencies: [{ target, type, version_constraint }] }

3. Parallel Execution Model

The middle layer consists of N intermediate modules, each responsible for a disjoint subset of the decomposed specification. Modules execute independently with:

No shared mutable state – all communication passes through the specification interface
Spectral separation – each module’s representational capacity is orthogonalized to prevent semantic interference (dominant component overlap < 0.1)
Dynamic routing – input distribution adapts to score distribution shape; flat distributions trigger load‑balancing, peaked distributions trigger specialization

Lower‑level modules (implementation layer) follow the same pattern recursively, forming a tree whose depth is determined by granularity requirements.

4. Load Balancing and Resource Allocation

Specialization creates natural load imbalance. The system adjusts through:

Score‑adaptive routing (LASER) –

If gate scores show strong preference → route to strongest expert (preserve accuracy)
If scores are nearly uniform → expand candidate set, route to least loaded (balance latency)

Least‑loaded expert parallelism (LLEP) –
When a device becomes overloaded, its tokens and the corresponding expert parameters are dynamically relocated to underutilized devices. This yields:

Speedup factor: up to 5× under imbalance
Peak memory reduction: up to 4×
No accuracy degradation

5. Convergence Detection

A parallel system has converged when:

Intra‑module uncertainty → each module’s output distribution is delta‑concentrated
Inter‑module agreement → all modules map their outputs to the same semantic cluster

These are quantified as Collaborative Entropy (CoE) : U = mean(SE(x_i)) + sum(w_i * KL(p_i || p_bar))

Convergence is declared when U approaches zero. The verification gate at each layer validates conformance to specification before promoting results to the next layer.

6. Complexity Bounds

Let:

P = number of refinement phases
N_p = number of parallel modules in phase p
T_mid(i,p) = execution time of module i in phase p
T_low(ij,p) = execution time of submodule j under module i
V_p = verification time (hierarchically aggregated)

Total wall‑clock time: T_total = sum_over_p( max_i( T_mid(i,p) + max_j(T_low(ij,p)) ) + O(log N_p) + V_p )

Total compute: C_total = sum_over_p( sum_i C_mid(i,p) + sum_i sum_j C_low(ij,p) )

Because max_i and max_j are independent of N_p (parallel modules do not wait for each other), increasing N_p exponentially does not increase T_total beyond the linear factor introduced by additional phases P. Compute grows multiplicatively with N_p; time grows linearly with P.

7. Scaling Limits and Mitigations

Constraint	Critical Value	Mitigation
Inter‑module communication	N ~ 16–32	Hierarchical sub‑grouping
Agreement verification	N ~ 128	Cluster‑then‑cluster hierarchical aggregation (O(N log N))
Specification complexity	~MB scale	Modular references, shared schemas
Human validation latency	P ~ 10–20	Automated conformance gates, confidence thresholds

8. Relationship to Computational Irreducibility

Certain computations cannot be predicted without execution. This framework treats such irreducibility not as a limitation but as the reason to parallelize: the only way to determine behavior is to execute, and execution time is bounded by the slowest module while total compute scales with the number of modules.

This aligns with the negentropy maximization principle: semantic interference density increases when parallel modules operate on orthogonal subspaces, converting potential collisions into structured knowledge.

9. Summary

The orchestration strategy defines:

A three‑layer hierarchy (Intent → Specification → Execution) with fixed premises at each phase
Four decomposition patterns (Sequential, Parallel, Conditional, Iterative) selected by task structure
Parallel execution with spectral separation, dynamic routing, and least‑loaded redistribution
Convergence detection via collaborative entropy (intra‑ and inter‑module agreement)
Complexity bounds that decouple time (linear in phases) from compute (multiplicative in module count)

The architecture does not eliminate the need for human intent articulation or validation, but reduces human involvement to those two functions while automating decomposition, distribution, execution, and verification. The result is a system where parallelism scales compute without scaling time, bounded only by communication overhead and the irreducible cost of validation.

This site is open source. Improve this page.