Axiomatic Reasoning for LLMs

This is a snapshot plan of 2026/04/02

Implementation Plan on Python

1. Overview

This document defines a technical implementation plan for applying Deep Coding—a specification-driven, human–AI collaborative development methodology—to Python software projects. The plan translates the core principles of Deep Coding into concrete Python constructs, toolchains, and phase-based workflows.

The implementation rests on four technical pillars:

All execution steps, evaluation metrics, and verification gates are defined in Python-native terms, avoiding philosophical framing in favor of actionable technical procedures.


2. Core Technical Components

2.1 Structural Specification Layer

The specification is expressed using Python’s static and runtime type systems:

Example specification fragment :

from typing import Protocol
from pydantic import BaseModel

class EntitySpec(Protocol):
    def update(self, delta_time: float) -> None: ...
    def render(self, surface) -> None: ...

class GameState(BaseModel):
    score: int
    lives: int
    invincible_frames: int = 0

2.2 Skeleton–Tissue Architecture

Skeleton (Manually Maintained, AI-Inaccessible)

Tissue (AI-Generated, Regenerable)

The skeleton guarantees architectural invariants: because the skeleton owns the execution flow (e.g., a run method that calls abstract methods), any AI-generated tissue code is constrained by the skeleton’s calling order and error boundaries.

2.3 Generative Conformance Loop

For each module or feature, the generation process follows:

  1. Specification Update: Modify the structural specification (protocols, pydantic models, invariants).
  2. Generation Request: The AI generates implementation code that conforms to the updated specification.
  3. Static Verification: Run mypy --strict to verify type conformance.
  4. Runtime Validation: Execute pydantic model validations and unit tests.
  5. Commit: If all gates pass, commit the new specification and generated code.

Any failure at steps 3 or 4 triggers a regeneration attempt. The system does not proceed to the next phase until verification passes.

2.4 Fixed Premise Management

After each development phase, a structured summary is committed as a fixed premise. The summary includes:

This summary is stored in a version-controlled file (e.g., FIXED_PREMISE.md or as a structured YAML file). Subsequent phases treat this summary as immutable. Changes to the specification must be explicitly planned and approved before the next phase.


3. Phase-Based Implementation Workflow

Development proceeds in phases, each consisting of the following steps:

Phase Template

  1. Plan: Define the specification changes for this phase.
  2. Approval: Human reviews and approves the specification delta.
  3. Generation: AI generates implementation code from the updated specification.
  4. Static Gate: mypy --strict passes with zero errors.
  5. Test Gate: pytest passes all existing tests plus new tests for added functionality.
  6. Validation Gate: pydantic runtime validation passes for all data models.
  7. Summary: Generate and commit the fixed premise for this phase.

Example Phase Sequence for a Python Game Engine

Phase 1: Core Engine Skeleton

Phase 2: Player and Input Handling

Phase 3: Enemies and Collision

Phase 4: Data-Driven Stage System

Phase 5: UI and Editor Integration

Phase 6: Optimization and Polish


4. Technical Verification Gates

All generated code must pass the following automated checks before phase completion:

Gate Tool Command Pass Condition
Type Safety mypy mypy --strict --no-implicit-optional . 0 errors
Linting ruff ruff check . 0 errors, 0 warnings
Formatting black black --check . No formatting changes required
Unit Tests pytest pytest --cov=. --cov-fail-under=80 100% pass rate, coverage >= 80%
Docstring Coverage pydocstyle pydocstyle . 0 violations
Complexity radon radon cc -a -nb . Average cyclomatic complexity < 10

These gates are implemented as pre-commit hooks and CI pipeline checks, ensuring that any generated code that fails verification is rejected before integration.


5. AI Integration Interface

The AI (LLM) is invoked through a structured prompt that provides:

The AI’s output is restricted to:

The AI does not have permission to modify:

This interface is implemented as a function call or API endpoint that wraps the LLM with these constraints.


6. Evaluation Metrics

The success of the implementation is measured by:

Metric Target Measurement Method
Skeleton Modification Count 0 per phase Git diff on skeleton/ directory
Type Check Pass Rate 100% mypy exit code
Test Pass Rate 100% pytest exit code
Regeneration Iterations <= 3 per phase Count of generation attempts before gate pass
Specification Change Lead Time < 15 minutes Time from spec change to passing gates
Code Churn < 5% per feature Lines changed / lines added across phases
Human Intervention Ratio < 10% of phases Phases requiring human code edits / total phases

These metrics are collected automatically via CI logs and version control history.


7. Toolchain Requirements

Optional for advanced metrics:


8. Boundary Conditions and Limitations

This implementation plan assumes:

When these conditions are not met:


9. Conclusion

This implementation plan provides a concrete, toolchain-integrated pathway for adopting Deep Coding in Python projects. By encoding architectural constraints in Python’s native type system, enforcing generative conformance through automated verification gates, and maintaining fixed premises across recursive refinement phases, the plan translates theoretical principles into executable development workflows.

The result is a development process where:

This is not a philosophical proposal but a technically executable methodology grounded in existing Python tools and established software engineering practices. ```