Skip to content

Built-in Loop Reference

This document provides detailed reference information for selected built-in FSM loops. For a full catalog and conceptual guide, see LOOPS_GUIDE.md.


harness-optimize

Category: optimization File: scripts/little_loops/loops/harness-optimize.yaml

Score-gated hill-climbing on harness artifacts (skills, commands, CLAUDE.md). Each iteration proposes an edit to a declared target file set, runs a Harbor-format benchmark, accepts the change if the score rises (or reaches the target threshold), and reverts otherwise. Accepted mutations are committed to the current branch. Stops on the first stall.

Invocation

Via .ll/program.md (recommended for overnight runs):

# Populate .ll/program.md with Directive, Targets, Benchmark sections, then:
ll-loop run harness-optimize

Via --context flags:

ll-loop run harness-optimize \
  --context targets="skills/foo/SKILL.md" \
  --context tasks_dir=./benchmarks/foo \
  --context scorer=./scripts/score.sh

Multiple targets (space-separated):

ll-loop run harness-optimize \
  --context "targets=skills/foo/SKILL.md skills/bar/SKILL.md" \
  --context tasks_dir=./benchmarks/foo \
  --context scorer=./scripts/score.sh

See .ll/program.md convention for the steering file format and precedence rules.

Context Variables

Variable Default Description
targets "" Required. Whole-file mode: space-separated file paths to optimize (e.g. "skills/foo/SKILL.md"). State mode: path to a loop YAML file whose targets: block contains states: entries.
tasks_dir "" Required. Path to Harbor task directory passed to scorer.
scorer "" Required. Scorer command that prints a bare float to stdout on exit 0.
target_score 1.0 Early-stop threshold. 1.0 means "never early-stop on target reached".
max_iterations 30 Hard budget ceiling.
STATE_NAME State-mode only. Name of the state being optimized; set by dequeue_state and read by propose, apply, and write_trajectory_*.
EXAMPLES_FILE State-mode only. Path to the examples file for the current state; set by dequeue_state and injected into the propose prompt.

State Graph

init_run  (shell: create .ll/runs/harness-optimize/<run-id>/ dir, capture traj_path)
  → load_directive  (reads .ll/program.md; builds state queue when targets is a loop YAML)
      on_yes (state-mode: queue non-empty) → check_queue
        on_yes → dequeue_state  (pops STATE_NAME + EXAMPLES_FILE from queue)
          → baseline_score  (fragment: run_benchmark)
              on_yes → init_prev
                → propose  (LLM: extracts state action block; proposes revised action text)
                  → apply  (LLM: writes candidate action via yaml_state_editor.replace_action)
                    → score  (fragment: run_benchmark)
                        on_yes → gate  (convergence evaluator, direction: maximize)
                          target/progress → commit_and_log
                            → write_trajectory_accepted
                                on_yes (state-mode) → check_queue  (advance to next state)
                                on_no  (whole-file)  → capture_prev → propose  (continues)
                          stall/error → revert_and_log
                            → write_trajectory_rejected
                                on_yes (state-mode) → check_queue  (advance to next state)
                                on_no  (whole-file)  → done
                        on_no/on_error → revert_and_log → write_trajectory_rejected → ...
              on_no/on_error → done
        on_no (queue exhausted) → done
      on_no (whole-file mode) → baseline_score  (same subgraph; loops via capture_prev)

Trajectory

Each iteration appends one JSON line to .ll/runs/harness-optimize/<run-id>/states/<state>/trajectory.jsonl:

{"iter": 3, "score": 0.82, "accepted": true, "commit_sha": "abc1234"}
{"iter": 4, "score": 0.79, "accepted": false, "commit_sha": ""}

In whole-file mode <state> is whole-file. In state mode <state> is the name of the state being optimized (e.g. propose, apply). The <run-id> is a nanosecond timestamp captured by init_run.

Resume Behavior

On resume, load_directive reads the trajectory and checks out the best-scoring accepted commit's files before re-running the baseline. It also re-reads .ll/program.md to capture the Directive prose, ensuring the LLM proposal step has the optimization goal available even after a handoff. The run continues from the best known state, not the last attempted state.

Scorer Contract

The scorer command must follow the Harbor scorer protocol: - Exit 0 + bare float on stdout → yes (accepted score) - Exit 0 + non-float stdout → error - Exit non-zero → no

Dependencies

Imports lib/benchmark.yaml for the run_benchmark fragment.


deep-research

Category: research File: scripts/little_loops/loops/deep-research.yaml

Iterative web research synthesis loop. Accepts a research topic or question, generates an initial set of faceted search queries, performs web searches, evaluates and deduplicates sources, scores per-facet coverage, and iterates until coverage is sufficient or max_iterations is exhausted. Produces a structured Markdown report with executive summary, key findings, source table, coverage gaps, and conclusion.

Invocation

# Basic — positional arg injected into context.topic via input_key: topic
ll-loop run deep-research "What are the trade-offs of CRDT vs OT for collaborative editing?"

# Deeper research with higher coverage target
ll-loop run deep-research "your research topic" \
  --context depth=5 \
  --context coverage_threshold_pct=90

# Custom output directory
ll-loop run deep-research "your topic" \
  --context output_dir=.loops/my-research

Context Variables

Variable Default Description
topic "" Required. Research question or topic (injected from positional arg via input_key: topic).
output_dir .loops/research Directory where per-run subdirectories are created.
depth 3 Minimum number of search rounds before accepting convergence.
coverage_threshold_pct 85 Target coverage percentage; surfaced in the score_coverage prompt.

State Graph

init  (shell: slug topic, mkdir, touch 4 artifact files, capture run_dir)
  → generate_queries  (prompt: write 3–5 faceted queries to query-log.md;
                       initialize coverage.md with facet list)
    → search_web  (prompt: WebSearch/WebFetch; append findings + [Source: <url>] to knowledge-base.md)
      → evaluate_sources  (prompt: score relevance/credibility, deduplicate, mark LOW-QUALITY)
        → score_coverage  (prompt: score facets 1–5, update coverage.md;
                           emit COVERAGE_SUFFICIENT or NEED_MORE)
          on_yes (COVERAGE_SUFFICIENT) → synthesize
          on_no  (NEED_MORE)           → plan_next
          on_error                     → synthesize  (graceful degradation)
            → plan_next  (prompt: generate gap-filling queries, append to query-log.md)
              → search_web  (loop back)
  synthesize  (prompt: consolidate knowledge-base.md into structured report.md)
    → done  (terminal: report final output paths and facet scores)

Output Artifacts

All artifacts are written to ${context.output_dir}/<slug>/ where <slug> is a lowercase, hyphenated form of context.topic:

File Description
report.md Primary output — executive summary, key findings, source table, coverage gaps, conclusion
knowledge-base.md Accumulated findings with [Source: <url>] (relevance: N/5, credibility: N/5) annotations
coverage.md Per-facet coverage scores (1–5) updated each iteration; includes iteration count and average
query-log.md All search queries grouped by iteration (## Iteration N blocks)

Convergence

score_coverage uses the inline sentinel pattern (Option A, rn-plan-style):

  • Emits COVERAGE_SUFFICIENT when: average facet score ≥ 4.0 AND iteration ≥ depth
  • Emits NEED_MORE otherwise
  • on_error routes to synthesize (write what we have; don't stall)

Knowledge accumulation: knowledge-base.md appends across iterations (sources accumulate); coverage.md overwrites each iteration (only latest score matters for routing).