Axiomatic Reasoning for LLMs

Agent Design and State-Machine Prompting

1. Overview

This report examines a class of agent design patterns in which explicit state-machine specifications and structured reasoning templates replace open-ended persona prompts. The analysis synthesizes findings from 2025–2026 research on state-tracking prompts, pseudocode control flows, automated extraction of expert decision procedures, and style-template generation. Core results indicate that replacing act‑as role instructions with finite‑state constraints improves task success rates, token efficiency, and long‑horizon consistency across navigation, dialogue, and code‑generation benchmarks. Parallel advances in cognitive task analysis and style‑aware prompt synthesis enable automatic derivation of executable workflows and stylistic constraints from unstructured corpora.

2. State-Machine Prompting Architectures

2.1 Core Mechanisms

State‑machine prompting replaces implicit role assumptions with explicit state and transition definitions. Models receive a current state, a set of permissible actions, and transition rules that determine the next state based on execution outcomes.

Framework Structural Component Performance Delta
StateAct Self‑prompting with state‑chain tracking +10–30% over ReAct on Alfworld/Textcraft
CodeAgents Pseudocode with loops, conditionals, tool signatures +3–36pp success, –55–87% input tokens
MASMP Natural‑language FSM + lightweight memory module 60% win rate vs. StarCraft II level‑7 AI (baseline 0%)
FASTRIC Specification language for seven FSM elements Identifies model‑specific optimal formalism levels

2.2 Comparative Advantage over Persona Prompts

Controlled studies show that persona‑based instructions (“You are an expert in X”) do not improve factual accuracy and frequently destabilize performance. Off‑domain personas reduce scores by up to 30 percentage points. In contrast, structured constraint instructions—domain priming, rule‑based role prompting (RRP), context stacking—yield consistent gains. Symbolic prompting with explicit output constraints reduces variance across repeated executions.

3. Extracting and Structuring Expert Reasoning Flows

3.1 Cognitive Task Analysis with LLMs

Several pipelines automate the conversion of unstructured procedural documents into structured workflow templates.

Method Input Source Output Artifact Reported Accuracy
NTT Dialogue Mining Conversation logs Question‑decision flowcharts ~90% fidelity
SYNTACT SOP documents Executable code/pseudocode templates 88.4% end‑to‑end
BioWorkflow Bioinformatics papers Directed workflow graphs ~80% step recall
NL2CA Natural‑language experience descriptions Linear Temporal Logic → production rules Fully automated formalization

3.2 Reusable Thought Templates

Reasoning traces can be compressed into reusable templates that guide multi‑hop inference.

4. Style‑Conditioned Generation Through Template Extraction

4.1 Style Template Acquisition

Style‑transfer research demonstrates that stylistic features can be extracted from reference corpora and encoded as templates or embeddings.

Approach Extraction Method Key Result
ZeroStylus Two‑tier (sentence + paragraph) hierarchical templates Structured rewriting score 6.90 vs. direct prompt 6.70
LISA Zero‑shot prompting + knowledge distillation → 768‑dim style embedding Fine‑grained style discrimination
Step‑Back Profiling Distill author history into a concise gist profile Preserves authorial distinctiveness
Contrastive Examples Add author‑specific features + other‑author contrasts to RAG context +15% relative improvement over baseline RAG

4.2 Consistency‑Optimized Style Application

SCAR (Style Consistency‑Aware Response Ranking) demonstrates that ranking candidate responses by stylistic alignment achieves performance comparable to full‑dataset training with only 0.7% of labeled data. This indicates that optimizing for a well‑defined style consistency metric yields high sample efficiency and stable output.

5. Workflow Integration: From Documents to Executable Agents

A unified pipeline emerges from the surveyed techniques:

  1. Extraction Phase
    • Parse domain‑specific corpora (SOPs, dialogue logs, research articles).
    • Apply NTT‑style flow mining or SYNTACT‑style clarification‑planning‑implementation.
    • Output: Directed graph of decision steps or pseudocode skeleton.
  2. Structuring Phase
    • Map extracted steps to a finite‑state machine: states, triggers, actions, transitions.
    • Augment each state with IAO templates to make knowledge flow explicit.
    • Optionally embed style constraints via ZeroStylus or Step‑Back Profiling.
  3. Execution Phase
    • Initialize state based on task input.
    • For each state, issue a prompt that includes: current state definition, permitted actions, style template, and transition rules.
    • Evaluate output and advance state per FSM logic.
    • Maintain strategic memory using MASMP‑style lightweight modules.
  4. Optimization Phase
    • Monitor success rate, token consumption, and SCAR‑style consistency scores.
    • Apply query‑dependent prompt optimization or self‑evolution protocols (Autogenesis, Hyperagents) to refine templates and transition logic.

6. Architectural Extensions Beyond Autoregressive LLMs

Current limitations of context‑window length and token‑prediction objectives are implementation constraints, not inherent boundaries of computational intelligence. Emerging architectures address these constraints directly:

State‑machine prompt designs remain compatible with these architectures, as the underlying principle—explicit state representation and constrained action spaces—translates directly to more sophisticated cognitive frameworks.

7. Operational Metrics and Evaluation

Metric Relevance to State‑Machine Prompting
Task Success Rate Primary indicator of FSM guidance efficacy (StateAct +30%, CodeAgents +36pp).
Token Efficiency Pseudocode and structured templates reduce input tokens by 55–87% (CodeAgents).
Long‑Horizon Consistency MASMP memory module sustains strategic variables across decision cycles.
Style Consistency Score SCAR‑based ranking ensures output adheres to extracted style templates.
Sample Efficiency Template reuse enables strong performance with limited demonstrations (SCAR 0.7% data).

8. Summary of Core Principles