Generalized FSM Loop System¶

Overview¶

The little-loops (ll) loop system uses Finite State Machines (FSM) as the execution model for all loops. Users author loops directly as FSM YAML — defining states, transitions, and evaluators — and the engine executes them as-is.

This design provides: - Consistency - Single execution engine, predictable behavior - Debuggability - FSM state is always inspectable - Composability - All loops share the same underlying structure - Transparency - What you write is what the engine runs

Integration with Existing Tools¶

The FSM loop system complements the existing ll-auto and ll-parallel CLI tools:

Tool	Purpose	Relationship to FSM Loops
`ll-auto`	Sequential issue processing	Can be an action in FSM states
`ll-parallel`	Parallel issue processing with worktrees	Can be an action in FSM states
`ll-loop`	Execute FSM-based automation loops	Orchestrates any CLI tool as states

Example - using ll-auto as a step:

states:
  process_issues:
    action: "ll-auto --max-issues 5"
    on_yes: "verify"
    on_no: "done"

Common Loop Patterns¶

All loops are authored as FSM YAML in .loops/. Use /ll:create-loop to generate FSM YAML interactively, or write it directly. Common patterns and the FSM structures they produce:

Loop type	Best For	Mental Model
Fix until clean	Outcome-focused tasks	"Run check, fix if fails, repeat"
Drive a metric	Metric optimization	"Drive metric toward target"
Maintain constraints	Continuous compliance	"Keep these constraints true"
Run a sequence	Sequential workflows	"Do X, then Y, until Z"
Sub-loop composition	Multi-step pipelines	"Chain existing loops into a pipeline"

Loop Pattern Examples¶

1. Fix Until Clean¶

Run a check, apply a fix on failure, repeat until the check passes.

name: "goal-no-type-errors"
initial: "evaluate"
max_iterations: 20
states:
  evaluate:
    action: "/ll:check-code types"
    on_yes: "done"
    on_no: "fix"
  fix:
    action: "/ll:manage-issue bug fix"
    next: "evaluate"
  done:
    terminal: true

2. Drive a Metric¶

Measure a numeric value and apply fixes until it reaches a target.

name: "reduce-lint-errors"
initial: "measure"
context:
  metric_cmd: "ruff check src/ --output-format=json | jq '.count'"
  target: 0
  tolerance: 0
states:
  measure:
    action: "${context.metric_cmd}"
    capture: "current_value"
    evaluate:
      type: convergence
      toward: "${context.target}"
      tolerance: "${context.tolerance}"
    route:
      target: "done"
      progress: "apply"
      stall: "done"
  apply:
    action: "/ll:check-code fix"
    next: "measure"
  done:
    terminal: true

3. Maintain Constraints¶

Check multiple independent constraints in order; fix each on failure.

name: "code-quality-guardian"
initial: "check_tests"
states:
  check_tests:
    action: "pytest"
    on_yes: "check_lint"
    on_no: "fix_tests"
  fix_tests:
    action: "/ll:manage-issue bug fix"
    next: "check_tests"
  check_lint:
    action: "ruff check src/"
    on_yes: "check_types"
    on_no: "fix_lint"
  fix_lint:
    action: "/ll:check-code fix"
    next: "check_lint"
  check_types:
    action: "mypy src/"
    on_yes: "all_valid"
    on_no: "fix_types"
  fix_types:
    action: "/ll:manage-issue bug fix"
    next: "check_types"
  all_valid:
    # For daemon mode, set next: check_tests instead of terminal: true
    terminal: true

4. Run a Sequence¶

Execute an ordered list of steps, then check an exit condition.

name: "fix-all-types"
initial: "step_0"
max_iterations: 20
backoff: 2
states:
  step_0:
    action: "/ll:check-code types"
    next: "step_1"
  step_1:
    action: "/ll:manage-issue bug fix"
    next: "check_done"
  check_done:
    action: "mypy src/"
    on_yes: "done"
    on_no: "step_0"
  done:
    terminal: true
max_iterations: 20
backoff: 2

5. FSM (Direct)¶

Full control over states and transitions for complex workflows.

name: "lint-fix-cycle"
initial: "check"
states:
  check:
    action: "/ll:check-code lint"
    on_yes: "done"
    on_no: "fix"
  fix:
    action: "/ll:check-code fix"
    next: "check"
  done:
    terminal: true
max_iterations: 10

No compilation needed - this is the native format.

6. Sub-Loop Composition¶

Invoke another loop as a child FSM from a parent state. The child runs to completion; its terminal verdict (success / failure) drives the parent's transition.

name: "quality-then-commit"
initial: "run_quality"
max_iterations: 5
states:
  run_quality:
    loop: "fix-quality-and-tests"    # Invokes .loops/fix-quality-and-tests.yaml
    context_passthrough: true        # Share parent context vars; merge child captures back
    on_success: "run_git"            # Alias for on_yes
    on_failure: "done"               # Alias for on_no
  run_git:
    loop: "issue-refinement-git"     # No context_passthrough — child runs isolated
    on_success: "done"
    on_failure: "done"
  done:
    terminal: true

Key fields: - loop: <name> — resolves to .loops/<name>.yaml (project) or the built-in catalog. - context_passthrough: true — passes all parent context and captured variables into the child loop; the child's captured values are merged back into the parent's captured namespace on completion. - on_success / on_failure — aliases for on_yes / on_no accepted in all states (not just sub-loop states).

Without context_passthrough the child loop runs with its own isolated context and its captured values are not available to the parent after it completes.

Typed parameter bindings (`parameters:` / `with:`)¶

A child loop can declare a typed input contract at the top level. Callers bind parent values explicitly instead of passing the whole context:

# child loop — declares what it needs
parameters:
  input:
    type: string          # string | integer | number | boolean | enum | path
    required: true
    description: "Issue ID(s) to refine"

# parent loop — binds only the declared input
states:
  refine_issue:
    loop: "recursive-refine"
    with:
      input: "${captured.input.output}"
    on_success: "next_state"
    on_failure: "skip"

with: is mutually exclusive with context_passthrough on the same state. Unknown with: keys and missing required parameters are caught at load time. Values support ${variable} interpolation; type validation runs after interpolation at runtime. For the full ParameterSpec definition see scripts/little_loops/fsm/schema.py.

Inheritance pattern (`from:`)¶

For variants that share a skeleton — e.g. APO loops that share category, iteration cap, default context, and a done: terminal state — declare a parent template and have children inherit it via a top-level from: field. The loader deep-merges parent into child before validation: scalars and lists override, dicts (context, states, route) merge recursively, and the from: key is stripped from the result. Cycles raise ValueError; missing parents raise FileNotFoundError.

# scripts/little_loops/loops/lib/apo-base.yaml — not runnable directly
name: apo-base
category: apo
max_iterations: 20
timeout: 3600
on_handoff: spawn
context:
  prompt_file: system.md
states:
  done:
    terminal: true

# scripts/little_loops/loops/apo-beam.yaml — inherits + overrides
name: apo-beam
from: lib/apo-base
initial: generate_variants
context:
  beam_width: 4
  target_score: 90
states:
  generate_variants: { ... }
  # `done` state and all parent scalars come from apo-base

Use from: for whole-loop skeleton reuse, fragment: for shared state structures, and loop: for sub-loop pipelining. See docs/guides/LOOPS_GUIDE.md#loop-template-inheritance-via-from for the full merge rules.

Universal FSM Schema¶

All loops compile to this schema. Compilation from paradigm syntax to FSM is handled by a Python module (little_loops.fsm).

Action Types¶

Actions can be either: - Shell commands: pytest, ruff check src/, npm run build - Claude Code slash commands: /ll:manage-issue, /ll:check-code fix

The executor detects slash commands by the leading / and routes them appropriately.

Schema Definition¶

# Required
name: string                    # Unique loop identifier
initial: string                 # Starting state name
states:                         # State definitions
  <state_name>:
    action: string              # Command to execute (optional for decision states)

    # --- Evaluation Layer ---
    evaluate:                   # How to evaluate the action result
      type: string              # exit_code, output_numeric, output_json,
                                # output_contains, llm_structured, convergence,
                                # diff_stall, mcp_result, harbor_scorer
      # ... type-specific fields (see Evaluator Types)

    # --- Routing Layer ---
    # Option 1: Shorthand for common cases
    on_yes: string          # Next state on success verdict
    on_no: string          # Next state on failure verdict
    on_error: string            # Next state on error verdict
    on_<verdict>: string        # Custom shorthand for any verdict (e.g., on_done, on_retry)
                                # Used with llm_structured evaluators that return non-standard verdicts

    # Option 2: Full routing table (overrides shorthand)
    route:                      # Map verdict strings to next states
      <verdict>: string         # e.g., success: "deploy", blocked: "escalate"
      _: string                 # Default for unmatched verdicts
      _error: string            # Evaluation/execution errors

    # --- Other State Properties ---
    next: string                # Unconditional transition (no evaluation)
    terminal: boolean           # True if this is an end state
    capture: string             # Variable name to store output
    timeout: number             # Action-level timeout in seconds
    fragment: string            # Name of a fragment to inherit fields from (parse-time only;
                                # state-level keys override fragment keys at every nesting level)
    agent: string               # Subprocess agent name; passes --agent <name> to the Claude subprocess (prompt states only)
    tools: list                 # Subprocess tool scope; passes --tools <csv> to the Claude subprocess (prompt states only)

# Optional Loop-Level Settings
description: string             # Human-readable summary shown by ll-loop list and /ll:review-loop
                                # (recommended; ll-loop validate emits a WARNING if absent — required for
                                # debug-loop-run / audit-loop-run goal-alignment assessment)
context: object                 # Shared variables/config
scope: array[string]            # Paths this loop operates on (for concurrency)
max_iterations: integer         # Safety limit (default: 50)
backoff: number                 # Seconds between iterations
timeout: number                 # Max total runtime in seconds (loop-level)
default_timeout: number         # Default per-state action timeout in seconds (overridden by state-level timeout:)
maintain: boolean               # Restart after completion
category: string                # Grouping category for loop discovery and ll-loop list filtering
                                # (e.g. 'apo', 'code-quality', 'issue-management')
labels: array[string]           # Arbitrary tags for loop filtering with ll-loop list --label
commands:                       # Override Commands section in ll-loop show output (optional)
  - cmd: string                 #   Full command string (e.g. "ll-loop run my-loop --param x=1")
    comment: string             #   Short description shown as comment
import: array[string]           # Fragment library files to load, resolved relative to the loop
                                # file's directory (e.g. ["lib/common.yaml"]). Loaded before
                                # states are parsed.
fragments: object               # Inline named fragment definitions — keys are fragment names,
                                # values are partial state dicts. Local fragments override
                                # same-name imported fragments.

# Per-Loop Config Overrides (optional)
config:
  handoff_threshold: integer      # Override LL_HANDOFF_THRESHOLD for this loop (1-100)
  commands:
    confidence_gate:
      readiness_threshold: integer  # Override readiness gate threshold (1-100)
      outcome_threshold: integer    # Override outcome confidence threshold (1-100)
  automation:
    max_continuations: integer    # Override max continuation count (>=1)
  continuation:
    max_continuations: integer    # Alias for automation.max_continuations
# Precedence: CLI flags > YAML config block > global ll-config > schema defaults
# All fields are optional; omitted fields inherit the global ll-config value.

# LLM Evaluation Settings
llm:
  model: string                 # Model for LLM evaluation (default: DEFAULT_LLM_MODEL from schema.py)
  max_tokens: integer           # Max tokens for evaluation (default: 256)
  timeout: number               # Timeout for LLM calls in seconds (default: 1800)

Two-Layer Transition System¶

The FSM executor uses a two-layer system that cleanly separates concerns:

┌─────────────────────────────────────────────────────────────┐
│                     State Execution                          │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Layer 1: EVALUATE                                           │
│  "Given the action output, what happened?"                   │
│  → Produces a structured result with verdict string          │
│                                                              │
│  Layer 2: ROUTE                                              │
│  "Given the verdict, where do I go next?"                    │
│  → Maps verdict to next state                                │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Why Two Layers?¶

Single Responsibility: Each layer does one job well. - Evaluators answer "what happened?" without knowing about states - Routing answers "where next?" without knowing how verdicts were produced

Testable: Test each layer in isolation. - Test evaluator: "Given this output, does it produce the right verdict?" - Test routing: "Given this verdict, does it go to the right state?"

Extensible: Add new evaluators without touching routing logic.

Layer 1: Evaluate¶

The evaluate block defines how to interpret an action's output. Every evaluator produces a result object:

result:
  verdict: string       # The routing key (e.g., "yes", "no", "target", "blocked")
  details: object       # Evaluator-specific data (available as ${result.*})

Default Evaluation by Action Type¶

Action Type	Default Evaluator	Rationale
Shell command	`exit_code`	Standard Unix semantics
Slash command	`llm_structured`	Natural language output

Explicit Evaluation¶

states:
  check:
    action: "pytest --tb=short"
    evaluate:
      type: exit_code
    # Result: { verdict: "yes" | "no" | "error", details: { exit_code: 0 } }

Layer 2: Route¶

The route block maps verdicts to next states. It's just a dictionary lookup.

Shorthand Syntax¶

For simple success/failure routing, use the shorthand:

states:
  check:
    action: "pytest"
    on_yes: "deploy"
    on_no: "fix"
    on_error: "alert"

An additional shorthand, on_blocked, routes when the evaluator returns a blocked verdict (i.e., the action cannot proceed without external intervention):

states:
  fix:
    action: "/ll:manage-issue bug fix"
    on_yes: "verify"
    on_no: "fix"
    on_blocked: "escalate"

on_blocked is resolved alongside on_yes/on_no/on_error in the shorthand lookup. It is equivalent to adding blocked: "escalate" to a full route table and is the recommended way to handle blocked states without switching to a full route table. If a blocked verdict is returned and no on_blocked target is defined, the loop terminates with a fatal routing error — define on_blocked on any state whose action can return a blocked verdict.

This is equivalent to:

states:
  check:
    action: "pytest"
    route:
      yes: "deploy"
      no: "fix"
      error: "alert"

Full Route Table¶

When you need more than three outcomes:

states:
  fix:
    action: "/ll:manage-issue bug fix"
    evaluate:
      type: llm_structured
    route:
      yes: "verify"
      no: "fix"
      blocked: "escalate"
      partial: "probe"
      _: "fix"              # Default for any other verdict

Special Route Keys¶

Key	Meaning
`_`	Default route for unmatched verdicts
`_error`	Route for evaluation/execution errors (overrides loop-level `on_error`)

Resolution Order¶

If next is present → unconditional transition (no evaluation)
If route is present → use full routing table
If on_yes/on_no/on_error/on_blocked or any custom on_<verdict> shorthand → use shorthand routing
If terminal: true → end loop
Otherwise → error (no valid transition)

on_error precedence: When a shell action exits non-zero, or when an action raises an unhandled Python exception (e.g. an MCP tool failure, a late-caught interpolation error, or an error from a contributed ActionRunner), and on_error is defined, the executor routes to on_error even if next is not the active path. next is only followed on a zero exit code and a non-raising action; a non-zero exit or raised exception always triggers on_error when it is defined. This means you can combine next (happy-path unconditional hop) and on_error (error recovery) on the same state — they are not mutually exclusive. An action_error event is emitted whenever an exception is routed through on_error.

Evaluator Types¶

Evaluators produce verdicts from action output. The executor uses LLM structured output only for the llm_structured evaluator—all others are deterministic.

Tier 1: Deterministic Evaluators¶

No API calls. Fast, free, reproducible.

`exit_code` (Default for Shell Commands)¶

evaluate:
  type: exit_code

Exit Code	Verdict
0	`yes`
1	`no`
2+	`error`

Result details: { exit_code: <int> }

`output_numeric`¶

Parse stdout as a number and compare.

evaluate:
  type: output_numeric
  operator: le        # eq, ne, lt, le, gt, ge
  target: 5

Comparison Result	Verdict
Condition met	`yes`
Condition not met	`no`
Parse error	`error`

Result details: { value: <number>, target: <number>, operator: <string> }

`output_json`¶

Parse JSON and extract a value.

evaluate:
  type: output_json
  path: ".summary.failed"    # jq-style path
  operator: eq
  target: 0

Result details: { value: <any>, path: <string>, target: <any> }

`output_contains`¶

Pattern matching on stdout.

evaluate:
  type: output_contains
  pattern: "All tests passed"    # Substring or regex
  negate: false                  # If true, success when NOT found

Match Result	Verdict
Pattern found (negate=false)	`yes`
Pattern not found (negate=false)	`no`
Pattern found (negate=true)	`no`
Pattern not found (negate=true)	`yes`

Result details: { matched: <bool>, pattern: <string>, negate: <bool> }

`convergence`¶

Compare current value to previous value and target. Used by the convergence paradigm.

evaluate:
  type: convergence
  target: 0
  tolerance: 0              # Optional: success when within tolerance
  previous: "${prev.output}" # Previous measurement
  direction: minimize       # minimize (default) or maximize

Scenario	Verdict
Value within tolerance of target	`target`
Value improved toward target	`progress`
Value unchanged or worsened	`stall`

Result details: { current: <number>, previous: <number>, target: <number>, delta: <number> }

`diff_stall`¶

Detect when a fix loop is spinning without making filesystem changes by comparing git diff --stat between iterations.

evaluate:
  type: diff_stall
  scope: ["src/", "tests/"]    # Optional: paths to pass to git diff --stat (default: entire repo)
  max_stall: 1                  # Optional: consecutive no-change iterations before `no` verdict

Scenario	Verdict
First call (no prior snapshot)	`yes`
Diff changed since last iteration	`yes`
Diff unchanged, stall count < max_stall	`yes`
Diff unchanged, stall count >= max_stall	`no`
git unavailable or command failed	`error`

Result details: { stall_count: <int>, max_stall: <int>, diff_changed: <bool> }

Tier 2: LLM Evaluator¶

Uses a Claude API call with structured output. This is the only place in the FSM system that uses LLM structured output.

`llm_structured` (Default for Slash Commands)¶

evaluate:
  type: llm_structured
  prompt: "Did this fix attempt succeed?"   # Optional custom prompt
  schema:                                    # Optional custom schema
    type: object
    properties:
      verdict:
        type: string
        enum: ["yes", "no", "blocked", "partial"]
      confidence:
        type: number
      reason:
        type: string
    required: ["verdict", "confidence", "reason"]
  min_confidence: 0.7                        # Threshold for confident verdicts

Default Schema¶

When no schema is provided:

schema:
  type: object
  properties:
    verdict:
      type: string
      enum: ["yes", "no", "blocked", "partial"]
      description: |
        - yes: The condition/check evaluated to true
        - no: The condition/check evaluated to false
        - blocked: Cannot proceed without external help
        - partial: Made progress but not complete
    confidence:
      type: number
      minimum: 0
      maximum: 1
      description: "Confidence in this verdict (0-1)"
    reason:
      type: string
      description: "Brief explanation"
  required: ["verdict", "confidence", "reason"]

Confidence Handling¶

The evaluator adds a confident flag to the result:

result:
  verdict: "yes"           # From LLM response
  details:
    confidence: 0.85
    confident: true            # confidence >= min_confidence
    reason: "Fixed the type error in handlers.py"

You can route on confidence:

states:
  fix:
    action: "/ll:manage-issue bug fix"
    evaluate:
      type: llm_structured
      min_confidence: 0.7
    route:
      yes: "verify"
      no: "fix"
      blocked: "escalate"
      partial: "probe"
      _: "fix"

Or create compound routing with confidence:

states:
  fix:
    action: "/ll:manage-issue bug fix"
    evaluate:
      type: llm_structured
      min_confidence: 0.7
      # When confidence < min_confidence, verdict becomes "<verdict>_uncertain"
      # e.g., "yes" → "yes_uncertain"
      uncertain_suffix: true
    route:
      yes: "verify"
      yes_uncertain: "probe"    # High confidence yes → verify, low → probe
      no: "fix"
      no_uncertain: "probe"
      blocked: "escalate"
      _: "fix"

Implementation¶

# little_loops/fsm/evaluators.py

import json
import subprocess

from little_loops.host_runner import resolve_host

def evaluate_llm_structured(
    output: str,
    prompt: str | None,
    schema: dict,
    min_confidence: float = 0.5,
    uncertain_suffix: bool = False,
    model: str = DEFAULT_LLM_MODEL,  # Default from schema.py
    max_tokens: int = 256,
    timeout: int = 1800,
) -> dict:
    """
    Evaluate action output using an LLM with structured output.

    This is the ONLY place in the FSM system that uses LLM structured output.
    Dispatches through ``host_runner.resolve_host()`` and shells out to the
    resolved host CLI (e.g. ``claude``) — no Anthropic SDK dependency.
    """
    default_prompt = "Evaluate whether this action succeeded based on its output."
    eval_prompt = prompt or default_prompt

    # Truncate output to avoid context limits
    truncated = output[-4000:] if len(output) > 4000 else output
    user_prompt = f"{eval_prompt}\n\n<action_output>\n{truncated}\n</action_output>"

    invocation = resolve_host().build_blocking_json(prompt=user_prompt, model=model)
    args = list(invocation.args) + [
        "--json-schema", json.dumps(schema),
        "--no-session-persistence",
    ]
    proc = subprocess.run(
        [invocation.binary, *args],
        capture_output=True,
        text=True,
        timeout=timeout,
    )

    # Parse the CLI's JSON envelope; --json-schema returns the validated
    # dict under "structured_output" on success.
    envelope = json.loads(proc.stdout.strip())
    llm_result = envelope.get("structured_output") or envelope.get("result") or envelope

    # Build result with confidence flag
    verdict = llm_result["verdict"]
    confidence = llm_result.get("confidence", 1.0)
    confident = confidence >= min_confidence

    # Optionally modify verdict for low confidence
    if uncertain_suffix and not confident:
        verdict = f"{verdict}_uncertain"

    return {
        "verdict": verdict,
        "details": {
            "confidence": confidence,
            "confident": confident,
            "reason": llm_result.get("reason", ""),
            "raw": llm_result,
        }
    }

Cost and Performance¶

Metric	Value
Cost per evaluation	~$0.001
Latency	300-800ms
Context used	~4000 tokens max

Tier 3: MCP Evaluator¶

Evaluates results from action_type: mcp_tool states. Unlike Tier 1 (deterministic) and Tier 2 (LLM-based) evaluators, this evaluator invokes an external MCP process and parses its protocol envelope — it is neither a simple exit-code check nor an LLM call.

`mcp_result`¶

Evaluate an MCP tool call result. All inputs (output and exit_code) are injected by the executor from the MCP subprocess — type: mcp_result is the entire YAML config; there are no user-configurable fields.

evaluate:
  type: mcp_result

Scenario	Verdict
Tool ran successfully (isError: false)	`success`
Tool returned an error (isError: true)	`tool_error`
Server or tool not in `.mcp.json` (exit 127)	`not_found`
Transport-level timeout (exit 124)	`timeout`

Result details: { exit_code: <int>, envelope: <object> }

Important: mcp_result is the only built-in evaluator that returns success as a verdict (not yes). The on_yes/on_no shorthand will not match any of its verdicts. A full route: table is required:

states:
  call_tool:
    action_type: mcp_tool
    # ... mcp_tool config ...
    evaluate:
      type: mcp_result
    route:
      success: "next_state"
      tool_error: "handle_error"
      not_found: "abort"
      timeout: "retry"

Tier 4: Benchmark Evaluator¶

Evaluates results from Harbor-format benchmark scorer subprocesses.

`harbor_scorer`¶

Evaluate a Harbor-format benchmark scorer result. The scorer is a shell command that prints a bare float (0.0–1.0) to stdout and exits 0 on success. No user-configurable fields — type: harbor_scorer is the entire YAML config.

evaluate:
  type: harbor_scorer

Scenario	Verdict
Exit 0 and stdout parses as float	`yes`
Exit non-zero	`no`
Exit 0 and stdout is not a valid float	`error`

Result details: { score: <float>, exit_code: 0 } (on yes); { exit_code: <int> } (on no).

The harbor_scorer evaluator is used with the run_benchmark fragment from lib/benchmark.yaml:

import:
  - lib/benchmark.yaml

states:
  score:
    fragment: run_benchmark
    action: "my-scorer ${context.tasks_dir}"
    on_yes: pass
    on_no: fail

Evaluation Source¶

By default, evaluators examine the current action's stdout. Use source to evaluate something else:

states:
  measure:
    action: "mypy src/ | grep -c error || echo 0"
    capture: "errors"
    next: "decide"

  decide:
    # No action—evaluate captured value from previous state
    evaluate:
      type: output_numeric
      source: "${captured.errors.output}"
      operator: eq
      target: 0
    route:
      yes: "done"
      no: "fix"

This enables decision states that branch without executing an action.

Variable Interpolation¶

Actions, evaluators, and routes can reference dynamic values using ${namespace.path} syntax.

Available Namespaces¶

Namespace	Description	Lifetime
`context`	User-defined variables from `context:` block	Entire loop
`captured`	Values stored via `capture:` in previous states	Entire loop
`prev`	Shorthand for previous state's result	Current state only
`result`	Current evaluation result	Current state only
`state`	Current execution metadata	Current state only
`loop`	Loop-level metadata	Entire loop
`env`	Environment variables	Entire loop

Namespace Contents¶

`context` - User-Defined Variables¶

context:
  target_dir: "src/"
  max_errors: 10
  metric_cmd: "mypy ${context.target_dir}"

states:
  check:
    action: "${context.metric_cmd}"

`captured` - Stored Action Results¶

When a state uses capture: "varname":

captured:
  varname:
    output: "..."       # stdout
    stderr: "..."       # stderr
    exit_code: 0        # exit code
    duration_ms: 1234   # execution time

`prev` - Previous State Shorthand¶

${prev.output}      # stdout from previous state
${prev.exit_code}   # exit code from previous state
${prev.state}       # name of previous state

`result` - Current Evaluation Result¶

After evaluation runs, access the result:

${result.verdict}           # The verdict string
${result.details.confidence} # For LLM evaluation
${result.details.reason}     # For LLM evaluation
${result.details.value}      # For numeric evaluation

Example - logging evaluation details:

states:
  fix:
    action: "/ll:manage-issue bug fix"
    evaluate:
      type: llm_structured
    route:
      yes: "log_success"
      no: "log_failure"
      _: "fix"

  log_success:
    action: "echo 'Fixed: ${result.details.reason}' >> .loops/fix.log"
    next: "verify"

  log_failure:
    action: "echo 'Failed (${result.details.confidence}): ${result.details.reason}' >> .loops/fix.log"
    next: "fix"

`state` - Current Execution Context¶

${state.name}       # current state name
${state.iteration}  # loop iteration (1-indexed)

`loop` - Loop Metadata¶

${loop.name}        # loop identifier
${loop.started_at}  # ISO 8601 timestamp
${loop.elapsed_ms}  # milliseconds since start
${loop.elapsed}     # human-readable (e.g., "2m 34s")

`env` - Environment Variables¶

${env.HOME}
${env.CI}
${env.PATH}

Resolution Rules¶

Rule	Behavior
Timing	Resolved at runtime, just before use
Undefined variable	Loop terminates with error
Empty value	Interpolates as empty string
Escaping	Use `$${` for literal `${`; bash parameter expansion operators (`:-`, `:+`, `[@]`, etc.) inside `$${...}` pass through unchanged to the shell
Bash default values	`${var:-default}` syntax is resolved by the interpolation engine at runtime — if `var` is defined and non-empty its value is used; if absent or empty, `default` is substituted. Use this for optional context variables.
Nesting	Not supported

Complete Examples¶

Example 1: Simple Shell Command Loop¶

# Simplest case - all defaults
name: "test-until-pass"
initial: "test"
states:
  test:
    action: "pytest"
    on_yes: "done"
    on_no: "fix"
  fix:
    action: "git stash pop"    # Try a stashed fix
    next: "test"
  done:
    terminal: true
max_iterations: 5

Example 2: Slash Command with LLM Evaluation¶

name: "fix-types"
initial: "check"
states:
  check:
    action: "mypy src/"
    on_yes: "done"
    on_no: "fix"

  fix:
    action: "/ll:manage-issue bug fix"
    # Default: evaluate with llm_structured, route success/failure
    on_yes: "check"
    on_no: "check"    # Retry even on failure (up to max_iterations)

  done:
    terminal: true
max_iterations: 10

Example 3: Custom Verdicts with Full Routing¶

name: "smart-fix"
initial: "fix"
states:
  fix:
    action: "/ll:manage-issue bug fix"
    evaluate:
      type: llm_structured
      min_confidence: 0.7
    route:
      yes: "verify"
      no: "fix"
      blocked: "escalate"
      partial: "probe"
      _: "fix"

  probe:
    # Deterministic fallback when LLM is uncertain
    action: "mypy src/ 2>&1 | grep -c 'error:' || echo 0"
    evaluate:
      type: output_numeric
      operator: eq
      target: 0
    route:
      yes: "done"
      no: "fix"

  verify:
    action: "pytest tests/"
    on_yes: "done"
    on_no: "fix"

  escalate:
    action: "echo 'Blocked: ${result.details.reason}' | notify-team"
    terminal: true

  done:
    terminal: true
max_iterations: 15

Example 4: Convergence Loop¶

name: "reduce-errors"
initial: "measure"
context:
  target: 0

states:
  measure:
    action: "mypy src/ 2>&1 | grep -c 'error:' || echo 0"
    capture: "errors"
    evaluate:
      type: convergence
      target: "${context.target}"
      previous: "${prev.output}"
    route:
      target: "done"
      progress: "fix"
      stall: "done"

  fix:
    action: "/ll:manage-issue bug fix"
    next: "measure"

  done:
    action: "echo 'Finished with ${captured.errors.output} errors'"
    terminal: true
max_iterations: 20

Example 5: CI-Friendly (No LLM Evaluation)¶

name: "ci-checks"
initial: "lint"

# Disable LLM evaluation entirely
llm:
  enabled: false

states:
  lint:
    action: "ruff check src/"
    on_yes: "typecheck"
    on_no: "fix_lint"

  fix_lint:
    action: "ruff check src/ --fix"
    next: "lint"

  typecheck:
    action: "mypy src/"
    on_yes: "test"
    on_no: "done"    # Can't auto-fix types

  test:
    action: "pytest --tb=short"
    on_yes: "done"
    on_no: "done"

  done:
    terminal: true
max_iterations: 3
timeout: 600

Example 6: Complex Workflow with Confidence Routing¶

name: "safe-refactor"
initial: "analyze"

llm:
  model: "${DEFAULT_LLM_MODEL}"  # Uses default from schema.py

states:
  analyze:
    action: "/ll:audit-architecture patterns"
    evaluate:
      type: llm_structured
      schema:
        type: object
        properties:
          verdict:
            type: string
            enum: ["found_opportunities", "no_opportunities"]
          opportunities:
            type: array
            items: { type: string }
          confidence:
            type: number
        required: ["verdict", "confidence"]
    route:
      found_opportunities: "refactor"
      no_opportunities: "done"
      _: "done"

  refactor:
    action: "/ll:manage-issue enhancement implement"
    evaluate:
      type: llm_structured
      min_confidence: 0.8
      uncertain_suffix: true
    route:
      yes: "verify"
      yes_uncertain: "manual_check"
      no: "refactor"
      blocked: "rollback"
      _: "refactor"

  verify:
    action: "pytest && mypy src/"
    timeout: 300
    on_yes: "done"
    on_no: "rollback"

  manual_check:
    action: "echo 'Low confidence refactor - needs review: ${result.details.reason}'"
    terminal: true

  rollback:
    action: "git checkout -- src/"
    next: "done"

  done:
    terminal: true

max_iterations: 5
timeout: 1800

Error Handling¶

Error vs Failure¶

Outcome	Trigger	Default Behavior
Success	Evaluator returns `yes` verdict	Route via `yes` key or `on_yes`
Failure	Evaluator returns `no` verdict	Route via `no` key or `on_no`
Error	Execution crash, timeout, eval error	Route via `_error` or terminate loop

Customizing Error Handling¶

states:
  check:
    action: "pytest"
    on_yes: "deploy"
    on_no: "fix"
    on_error: "alert"           # Go to recovery state

# Or with full routing
  check:
    action: "pytest"
    route:
      yes: "deploy"
      no: "fix"
      _error: "alert"

Retry Current State¶

Use the special $current token:

states:
  flaky_test:
    action: "pytest tests/integration/"
    route:
      yes: "done"
      no: "$current"           # Retry this state
      _error: "$current"       # Also retry on errors

Retries respect max_iterations.

LLM Evaluation Errors¶

When LLM evaluation fails (API error, timeout, invalid response):

If _error route exists → use it
Otherwise → terminate loop with error

states:
  fix:
    action: "/ll:manage-issue bug fix"
    evaluate:
      type: llm_structured
    route:
      yes: "verify"
      no: "fix"
      _error: "probe"          # Fall back to deterministic on LLM failure

Timeouts¶

Three levels of timeout protection:

# Loop-level: max wall-clock time for entire loop
timeout: 3600                   # 1 hour

# Loop-level: default per-state action timeout (fallback for states without timeout:)
default_timeout: 3600           # 1 hour for each prompt/action state

states:
  build:
    action: "npm run build"
    timeout: 300                # 5 min for this action (overrides default_timeout)

Action timeout (state.timeout): Per-state override; catches hung processes
Default timeout (default_timeout): Loop-level fallback applied to all states that don't set timeout:; hardcoded fallback is 3600s for prompt/LLM states and 30s for MCP tool calls when neither is set
Loop timeout (timeout): Bounds total wall-clock execution time (independent of per-state timeouts)

LLM evaluation has its own timeout (default 1800s) configured at loop level:

llm:
  timeout: 45    # Seconds per LLM evaluation call

Concurrency and Locking¶

Only one loop can run at a time per scope.

Scope Declaration¶

name: "fix-api-types"
scope:
  - "src/api/"
  - "tests/api/"

Declaration	Behavior
Explicit paths	Loop claims those paths
No scope	Treated as `["."]` (whole project)

Overlap Rules¶

Scenario	Behavior
No other loop running	Start immediately
Non-overlapping scopes	Start immediately (parallel OK)
Overlapping scopes	Queue or fail

# Fail on conflict (default)
ll-loop run .loops/fix-types.yaml

# Wait for conflicting loop
ll-loop run .loops/fix-types.yaml --queue

Security Model¶

Execution Context¶

Loops are executed only via the ll-loop CLI command—never via slash commands. The /ll:create-loop command helps author loops, but execution requires explicit CLI invocation.

Autonomous Execution¶

All slash command actions use --dangerously-skip-permissions. This is non-negotiable for autonomous execution. Users accept this trade-off when they run a loop.

Aspect	Policy
User approval before loop	None required
Permission prompts during loop	Disabled (`--dangerously-skip-permissions`)
"Blessing" reviewed loops	Not supported—delete invalid loops

Risk Mitigation¶

Review before running: Users should read loop definitions before execution
Iteration limits: max_iterations prevents runaway loops
Timeouts: Action and loop-level timeouts bound execution
Scoped operations: scope declaration limits file system impact

File Structure¶

Canonical Location¶

All loop definitions live in .loops/:

.loops/
├── fix-types.yaml          # User-defined loop
├── lint-cycle.yaml         # User-defined loop
└── .running/               # Runtime state (auto-managed)
    ├── fix-types-20260503T122306.state.json
    └── fix-types-20260503T122306.events.jsonl

Relationship to `.issues/`¶

The .issues/ directory is separate and serves a different purpose:

Directory	Purpose	Used By
`.loops/`	FSM loop definitions	`ll-loop`
`.issues/`	Issue tracking files	`ll-auto`, `ll-parallel`, `/ll:manage-issue`

FSM loops can orchestrate tools that consume .issues/, but the directories remain independent.

Templates¶

No loop templates are provided. All loops are user-defined. Future versions may add templates for common workflows.

CLI Interface¶

Command: `ll-loop`¶

# Run a loop (primary usage - loop name resolves to .loops/<name>.yaml)
ll-loop test-analyze-fix
ll-loop fix-types --max-iterations 5
ll-loop lint-cycle --background

# Explicit run subcommand (alternative)
ll-loop run fix-types --dry-run
ll-loop run .loops/fix-types.yaml    # Full path also works

# Validate loop definition
ll-loop validate fix-types

# Manage running loops
ll-loop list
ll-loop list --running
ll-loop status fix-types
ll-loop stop fix-types
ll-loop resume fix-types

# History
ll-loop history fix-types

Run Flags¶

Flag	Description
`--background`	Run as daemon (mutually exclusive with `--worktree`)
`--worktree`	Run in an isolated git worktree (mutually exclusive with `--background`)
`--dry-run`	Show execution plan
`--queue`	Wait for conflicting loops
`--max-iterations N`	Override limit
`--no-llm`	Disable LLM evaluation
`--llm-model MODEL`	Override LLM model

Execution Engine¶

Flow¶

Load FSM from YAML
Set state to initial
Execute action (shell or Claude CLI)
Evaluate result (deterministic or LLM)
Route to next state based on verdict
Emit event
Repeat until terminal or limits reached

Action Execution¶

Type	Detection	Method
Shell	No leading `/`	`subprocess.run()`
Slash	Leading `/`	`claude --dangerously-skip-permissions -p "..."`

State Persistence¶

// .loops/.running/<instance-id>.state.json
{
  "current_state": "fix",
  "iteration": 3,
  "captured": {
    "errors": { "output": "4", "exit_code": 0 }
  },
  "last_result": {
    "verdict": "no",
    "details": { "confidence": 0.65, "reason": "..." }
  },
  "started_at": "2024-01-15T10:30:00Z"
}

Structured Events¶

Events stream to .loops/.running/<instance-id>.events.jsonl:

{"event": "loop_start", "loop": "fix-types", "ts": "..."}
{"event": "state_enter", "state": "check", "iteration": 1, "ts": "..."}
{"event": "action_start", "action": "mypy src/", "ts": "..."}
{"event": "action_complete", "exit_code": 1, "duration_ms": 2340, "ts": "..."}
{"event": "evaluate", "type": "exit_code", "verdict": "no", "ts": "..."}
{"event": "route", "from": "check", "to": "fix", "verdict": "no", "ts": "..."}
{"event": "state_enter", "state": "fix", "iteration": 1, "ts": "..."}
{"event": "action_start", "action": "/ll:manage-issue bug fix", "ts": "..."}
{"event": "action_complete", "duration_ms": 45000, "ts": "..."}
{"event": "evaluate", "type": "llm_structured", "verdict": "yes", "confidence": 0.92, "ts": "..."}
{"event": "route", "from": "fix", "to": "verify", "verdict": "yes", "ts": "..."}
{"event": "loop_complete", "final_state": "done", "iterations": 3, "ts": "..."}

CLI Progress Display¶

$ ll-loop run fix-types.yaml
[1/20] check → mypy src/
       ✗ failure (exit 1)
       → fix
[1/20] fix → /ll:manage-issue bug fix
       ✓ success (confidence: 0.92)
       → verify
[1/20] verify → pytest tests/
       ✓ success (exit 0)
       → done

Loop completed: done (1 iteration, 2m 34s)

Authoring Conventions¶

Failure Terminals Must Include a Diagnostic Action¶

A terminal: true state that signals failure (as opposed to a successful done) must be paired with a preceding non-terminal diagnose state that runs an action_type: prompt action before routing to the bare terminal. A failure terminal with no action produces a blank entry in ll-loop history — the only visible signal is the state name.

Important: do NOT put action_type: prompt directly on the terminal: true state. The FSM runner calls _finish("terminal") before executing any terminal-state action, so the action is silently skipped. The correct pattern is the same two-state split used for successful completion (report → done): a non-terminal diagnose state runs the prompt and routes next: failed, while failed stays as a bare terminal: true anchor.

  diagnose:
    action_type: prompt
    action: |
      The loop has terminated with an unrecoverable error.

      Diagnose what failed:
      - Check recent captured outputs for error signals.
      - Identify which state most likely caused the failure.
      - Write a one-paragraph diagnostic summary.
    next: failed

  failed:
    terminal: true

All states that previously routed directly to failed must instead route to diagnose:

  some_state:
    on_error: diagnose   # not: on_error: failed

Generator-Evaluator Loops: Never Route Evaluate Failures Back to Generate¶

In any generator-evaluator pattern (generate → evaluate → score → generate), the evaluate state's on_no and on_error routes must point forward — to score or another evaluation fallback — not back to generate. Routing back creates an undetectable infinite cycle: generate routes unconditionally to evaluate, which fails again for the same reason, repeating until max_iterations is exhausted with zero useful output.

When the external tool used by evaluate (Playwright, a linter, a build system) is unavailable, the correct fallback is to degrade to an alternative evaluation — LLM judgment of the artifact source, for example — not to regenerate the artifact.

  evaluate:
    # CORRECT: route failures forward to score (LLM fallback)
    on_yes: score
    on_no: score    # Playwright absent → LLM evaluates HTML source directly
    on_error: score

  # WRONG: routes back to generate, which routes back to evaluate — infinite cycle
  # on_no: generate
  # on_error: generate

Design Decisions¶

Why Two Layers (Evaluate + Route)?¶

Single Responsibility - Evaluators don't know about states; routing doesn't know about output parsing
Testability - Test each layer in isolation
Extensibility - Add evaluators without touching routing
Clarity - When debugging, you can ask "what verdict?" and "where did it route?" separately

Why LLM Structured Output Only for Evaluation?¶

Single Point of LLM Usage - Easy to understand cost model
Deterministic Compilation - Paradigm → FSM is reproducible
Testable - Mock one function to test everything else

Alternatives Considered¶

Single-layer with escape hatch: Simpler concept count, but the escape hatch creates a hidden mode switch. The two-layer model is honest about what's happening.

Predicate-based routing: More powerful but requires a predicate language. Verdict strings are simpler and sufficient.

Testing Strategy¶

The two-layer design (evaluate + route) enables isolated testing of each layer. The single LLM touchpoint means only one mock strategy is needed for all LLM-related tests.

1. Unit Tests for Evaluators¶

Tier 1 evaluators are pure functions—deterministic and fast.

# tests/unit/test_evaluators.py

class TestExitCodeEvaluator:
    @pytest.mark.parametrize("exit_code,expected_verdict", [
        (0, "yes"),
        (1, "no"),
        (2, "error"),
        (127, "error"),
    ])
    def test_exit_code_mapping(self, exit_code, expected_verdict):
        result = evaluate_exit_code(exit_code)
        assert result["verdict"] == expected_verdict
        assert result["details"]["exit_code"] == exit_code


class TestConvergenceEvaluator:
    def test_target_reached(self):
        result = evaluate_convergence(
            current=0, previous=5, target=0, tolerance=0
        )
        assert result["verdict"] == "target"

    def test_progress_made(self):
        result = evaluate_convergence(
            current=3, previous=5, target=0, tolerance=0
        )
        assert result["verdict"] == "progress"
        assert result["details"]["delta"] == -2

    def test_stall_detected(self):
        result = evaluate_convergence(
            current=5, previous=5, target=0, tolerance=0
        )
        assert result["verdict"] == "stall"

2. Mock Strategy for LLM Evaluation¶

LLM structured output is used in only one place: evaluate_llm_structured. Because the function shells out via host_runner rather than calling the Anthropic SDK, tests mock subprocess.run (and optionally resolve_host) — see scripts/tests/test_fsm_evaluators.py::TestLLMStructuredEvaluator for the canonical fixture pattern.

# tests/unit/test_llm_evaluator.py

class TestLLMEvaluator:
    def test_uncertain_suffix_applied(self):
        """Low confidence + uncertain_suffix=True → success_uncertain."""
        envelope = json.dumps({
            "type": "result",
            "subtype": "success",
            "structured_output": {"verdict": "yes", "confidence": 0.5},
        })
        with patch("little_loops.fsm.evaluators.subprocess.run") as mock_run:
            mock_run.return_value = MagicMock(returncode=0, stdout=envelope, stderr="")
            result = evaluate_llm_structured(
                output="Fixed the bug",
                schema=DEFAULT_SCHEMA,
                min_confidence=0.7,
                uncertain_suffix=True,
            )

        assert result.verdict == "yes_uncertain"
        assert result.details["confident"] is False

Mock implementation: Patch little_loops.fsm.evaluators.subprocess.run to stage the CLI envelope; resolve_host() returns the default ClaudeCodeRunner when claude is on PATH and can be patched directly for hermetic tests.

3. Integration Tests for Executor¶

Test the full state machine execution with mocked externals.

# tests/integration/test_executor.py

class TestExecutor:
    @pytest.fixture
    def mock_action_runner(self):
        """Mock that captures actions and returns configured results."""
        return MockActionRunner()

    def test_simple_success_path(self, mock_action_runner, mock_llm_evaluator):
        """check → done on first success."""
        fsm = load_fsm("test-pass-first.yaml")
        mock_action_runner.set_result("check", exit_code=0)

        executor = FSMExecutor(fsm, action_runner=mock_action_runner)
        result = executor.run()

        assert result.final_state == "done"
        assert result.iterations == 1
        assert mock_action_runner.calls == ["check"]

    def test_fix_retry_loop(self, mock_action_runner, mock_llm_evaluator):
        """check → fix → check → done with retry."""
        mock_action_runner.set_results([
            ("check", {"exit_code": 1}),  # First check fails
            ("fix", {"stdout": "Fixed"}),
            ("check", {"exit_code": 0}),  # Second check passes
        ])
        mock_llm_evaluator.set_verdict("fix", "yes")

        executor = FSMExecutor(fsm, ...)
        result = executor.run()

        assert result.final_state == "done"
        assert result.iterations == 2

    def test_max_iterations_respected(self, mock_action_runner):
        """Loop terminates at max_iterations even if not terminal."""
        fsm = {"max_iterations": 3, ...}
        mock_action_runner.always_fail()

        result = FSMExecutor(fsm, ...).run()

        assert result.iterations == 3
        assert result.terminated_by == "max_iterations"

    def test_variable_interpolation(self, mock_action_runner):
        """${context.*} and ${captured.*} resolve correctly."""
        fsm = {
            "context": {"target_dir": "src/"},
            "states": {
                "check": {
                    "action": "mypy ${context.target_dir}",
                    "capture": "errors",
                    ...
                }
            }
        }

        executor = FSMExecutor(fsm, action_runner=mock_action_runner)
        executor.run()

        assert mock_action_runner.last_action == "mypy src/"

4. Test File Organization¶

scripts/tests/
├── unit/
│   ├── test_evaluators.py         # All Tier 1 evaluators
│   ├── test_llm_evaluator.py      # LLM evaluator with mocked API
│   ├── test_routing.py            # Verdict → state resolution
│   └── test_interpolation.py      # Variable substitution
├── integration/
│   ├── test_executor.py           # Full FSM execution
│   └── test_state_persistence.py  # Resume from saved state
├── fixtures/
│   ├── loops/                     # Test FSM definitions
│   └── outputs/                   # Sample action outputs
└── conftest.py                    # Shared fixtures, mocks

Summary¶

Layer	Test Type	Mock Strategy
Compilers	Unit	None needed (pure functions)
Tier 1 Evaluators	Unit	None needed (deterministic)
LLM Evaluator	Unit	Mock Anthropic client
Routing	Unit	Provide verdict directly
Variable Interpolation	Unit	Provide context dict
Executor	Integration	Mock action runner + LLM evaluator

The `/ll:create-loop` Command¶

Interactive command for creating new loop configurations. Uses Claude to understand user intent and generate paradigm YAML.

Workflow¶

User invokes /ll:create-loop
        ↓
Claude asks clarifying questions (AskUserQuestion)
        ↓
User provides answers (single/multi-select, free text)
        ↓
Claude generates paradigm YAML
        ↓
User reviews/edits
        ↓
Loop saved to .loops/<name>.yaml

Example Session¶

> /ll:create-loop

What kind of automation loop do you want to create?
  ○ Fix until clean (Recommended)
  ○ Maintain constraints
  ○ Drive a metric toward a target
  ○ Run a sequence of steps
  ○ Other...

> Fix until clean

What should the loop fix?
  ☑ Type errors (mypy)
  ☑ Lint errors (ruff)
  ☐ Test failures (pytest)
  ☐ Other...

> [Type errors, Lint errors selected]

What's the maximum number of fix attempts?
  ○ 10 (Recommended)
  ○ 20
  ○ 50
  ○ Custom...

> 10

Here's the loop configuration:

```yaml
name: "fix-types-and-lint"
initial: check_types
max_iterations: 10
states:
  check_types:
    action: "mypy src/"
    on_yes: check_lint
    on_no: fix_types
  fix_types:
    action: "/ll:manage-issue bug fix"
    next: check_types
  check_lint:
    action: "ruff check src/"
    on_yes: all_valid
    on_no: fix_lint
  fix_lint:
    action: "/ll:check-code fix"
    next: check_lint
  all_valid:
    terminal: true

Save to .loops/fix-types-and-lint.yaml? [Y/n]

### Question Types

The command uses `AskUserQuestion` with various patterns:

| Pattern | Use Case |
|---------|----------|
| Single-select | Loop type choice, iteration limits |
| Multi-select | Which checks to include, which tools to use |
| Free text | Custom loop names, custom commands |

### Implementation Notes

- Command is implemented in `commands/create-loop.md`
- Uses Claude's natural language understanding to map user intent to paradigm
- Validates generated YAML before saving
- Offers to run `ll-loop validate <name>` after creation

---

## Open Questions

### Resolved

1. ✅ **Transition system design** - Two-layer (evaluate + route) with shortcuts for common cases
2. ✅ **LLM usage scope** - Only in `llm_structured` evaluator
3. ✅ **Confidence handling** - Optional `uncertain_suffix` for compound verdicts
4. ✅ **Integration with existing tools** - `ll-auto` and `ll-parallel` are complementary; they become executable actions within FSM states
5. ✅ **Security model** - No user approval; always use `--dangerously-skip-permissions`; no blessing mechanism
6. ✅ **File structure** - `.loops/` is canonical; separate from `.issues/`; no templates for now
7. ✅ **CLI UX** - Primary invocation is `ll-loop <loop-name>` (resolves to `.loops/<name>.yaml`)
8. ✅ **Context handoff integration** - Deferred to future; initial implementation terminates on handoff signal
9. ✅ **`/ll:create-loop` command** - In scope for v1; interactive command using `AskUserQuestion`

### For Implementation

10. **Maintain mode timing** - Configurable delay between cycles? (Can add `maintain_delay` field)

---

## Related Skills

### `/ll:loop-suggester`
Analyzes user message history from `ll-messages` output to automatically suggest FSM loop configurations. Instead of manually identifying patterns and using `/ll:create-loop`, this skill detects repeated workflows and generates ready-to-use loop YAML.

**When to use:**
- You want to discover automation opportunities from your existing work patterns
- You have repetitive check-fix-verify cycles you'd like to automate
- You're unsure which paradigm fits your workflow

**Example workflow:**
```bash
# Extract recent messages with tool usage metadata
ll-messages --include-response-context -n 200 -o messages.jsonl

# Invoke the skill to analyze and suggest loops
/ll:loop-suggester messages.jsonl

# Review suggestions in .ll/loop-suggestions/
# Copy promising yaml_config to .loops/ and validate

See skills/loop-suggester/SKILL.md for full documentation.

Future Considerations¶

Context handoff integration - Executor can detect CONTEXT_HANDOFF: signals from slash commands and spawn continuation sessions, preserving loop state transparently. Initial implementation may simply terminate.
Composition - Nested loops, sub-FSMs
Hooks - Pre/post state execution
Metrics - Track success rates, durations