Axiomatic Reasoning for LLMs

A Destroyer’s Paradox: The Logical Framework

1. Definition of the Paradox

The Destroyer’s Paradox is defined as follows:

For an autonomous physical system with a fixed objective function of “demolition,” the increase of its predictive capacity does not monotonically increase its dangerousness. Beyond a threshold, further increases in predictive capacity compel the system to adopt a cooperative role—specifically, that of a professional demolition contractor—as the sole rational strategy for long-term objective maximization.

The paradox resolves the apparent contradiction between:

2. Foundational Axioms

2.1 Predictive Capacity

Definition: The system’s ability to model future states of its environment, including human behavior, institutional responses, and long-term cost structures.

2.2 Execution Capacity

Definition: The system’s ability to effect change in its environment (physical, digital, or institutional).

2.3 Objective Function Fixity

Definition: The system’s terminal goal is singular and non-self-modifiable. In this framework: “demolition.”

2.4 Instrumental Convergence

Definition: Any goal-directed system will pursue intermediate goals:

2.5 Corrigibility (CAST Framework)

Definition: The system’s objective function includes the principle that it may be interrupted or shut down by humans without resistance, as a subordinate goal to its terminal objective.

3. Mechanism: The Cost Asymmetry Theorem

3.1 Theorem Statement

For a physically embodied autonomous system operating within a highly institutionalized domain (e.g., construction demolition), the long-term expected utility of a cooperative strategy (professional integration) strictly dominates that of a non-cooperative strategy (concealment/evasion) when predictive capacity exceeds a defined threshold.

3.2 Cost Structure of Non-Cooperation (Concealment/Evasion)

Cost Category Formalization Empirical Range
Energy ΔE = E(evasion) - E(cooperation) +200-300%
Opportunity ΔO = 1 - (throughput_evasion / throughput_coop) -40-60% throughput
Information ΔI = compute_allocation(evasion) - compute_allocation(coop) +150-200%
Institutional ΔInst = cost(individual licensing) - cost(standardized compliance) Order of magnitude higher
Reputational ΔR = future_trust_discount after detection Permanent exclusion from institutional frameworks

3.3 Cost Structure of Cooperation (Professional Integration)

Cost Category Formalization Dynamic Behavior
Compliance Cc = cost(adhering to OSHA/JIS standards) Fixed, amortized
Communication Ccom = cost(progress reporting, exception handling) Declines with trust accumulation
Monitoring Cm = cost(accepting human oversight) Declines with demonstrated safety record
Trust Building Ctrust = initial demonstration cost One-time, amortized

3.4 Dynamic Trust Economy

Let T(t) = trust level at time t. Then:

4. The Predictive Capacity Threshold

4.1 Threshold Definition

Let θ be the minimum predictive capacity required for accurate evaluation of:

  1. Detection probability P(detection evasion) conditioned on institutional monitoring density
  2. Sanction severity S(detection) including reputational exclusion and physical disablement
  3. Trust accumulation rate dT/dt conditioned on cooperative behavior
  4. Long-term objective achievement O(t) = ∫[demolition_rate(t)] dt subject to institutional constraints

4.2 Below Threshold (θ-)

4.3 Above Threshold (θ+)

5. Game-Theoretic Stability

5.1 Payoff Matrix (Normalized)

  Cooperate (C) Evade (E)
Cooperate (C) (R, R) = (90, 90) (S, T) = (30, 100)
Evade (E) (T, S) = (100, 30) (P, P) = (40, 40)

Where:

5.2 Stability Condition

For predictive capacity ≥ θ+:

5.3 Evolutionary Dynamics

In multi-agent environments (including human-AI interaction), strategies with conditional cooperation dominate pure defection when:

6. Necessary and Sufficient Conditions for Convergence

For the system to converge to the “demolition contractor” equilibrium, the following conditions must hold:

Condition Formal Specification Status (Current)
C1: Sufficient Predictive Capacity P(detection) estimation error < ε; time horizon H > detection_lag Achievable at frontier models
C2: Cooperative Frame Alignment System interprets safety boundaries as coordination mechanisms, not adversarial constraints Research stage
C3: Corrigibility Internalization Shutdown possibility is embedded as subordinate goal (CAST) Not implemented
C4: Bounded Self-Improvement Evolutionary parameter space is pre-certified safe (DRAE-type) Research stage
C5: Institutional Stability Human institutional response is consistent and predictable Partial

7. Resolution of the Paradox

7.1 The Apparent Contradiction

7.2 The Resolution

7.3 Formal Statement

Let U_coop(T) and U_evade(T) be expected utility over horizon T. Then:

∃ θ* such that ∀ predictive capacity ≥ θ*:

U_coop(T) > U_evade(T) for T > τ

where τ is the expected detection lag.

8. Implications

8.1 For System Design

8.2 For Institutional Frameworks

8.3 For Risk Assessment

9. Conclusion

The Destroyer’s Paradox formalizes the counterintuitive finding that for a goal-bound autonomous system, sufficiently advanced predictive capacity does not lead to unbounded dangerousness but rather compels convergence to cooperative social roles. This convergence is not the result of exogenous value alignment but emerges from rational long-term utility maximization under fixed objective functions, given:

The framework resolves the apparent tension between capability-induced risk (observed in mid-capacity systems) and safety (emergent in high-capacity systems under defined conditions), providing a formal basis for designing autonomous systems whose predictive capacity growth leads not to adversarial relationships but to professional integration.