josh

  • Home
  • Lab Notes
  • Notebook
  • Work With Me
  • Portfolio
  • About
  • Contact
© 2026 Josh Peterson

Lab.

Agent Runs Are Not Done Until They Leave A Trail

Josh Peterson
Josh Peterson
Cover Image for Agent Runs Are Not Done Until They Leave A Trail
Josh Peterson
Josh Peterson
June 11, 2026

The first thrill of AI automation is watching an agent do the thing.

It opens the file, checks the source, writes the script, updates the tracker, sends the status note, or generates the packet. A task that used to sit in the ugly middle distance between "too small to outsource" and "too annoying to do manually" suddenly moves.

That part is real. It is also not enough.

The more agentic work I run, the more convinced I am that the important question is not simply whether the agent completed the task. The important question is whether the run left behind enough evidence for the next person, or the next agent, to understand what happened.

An agent run is not done when the screen looks quiet.

It is done when it leaves a trail.

The Problem With Invisible Success

Most automation demos optimize for the satisfying moment: ask for something, receive the result.

That is fine for a toy workflow. It breaks down quickly in real work, where the output is only one part of the job.

Did the agent use the right source file? Did it hit the live API or a stale export? Did it skip a step because a browser was locked? Did it silently fail on a permission error? Did it update the durable tracker, or just say it did? If it created a document, can I trace which input produced it? If it sent a status note, was that note sanitized? If it stopped halfway through, is the partial state recoverable?

These questions sound fussy until you have to resume a task three days later.

Then they become the whole game.

The worst automation is not automation that fails loudly. Loud failure is useful. You can inspect it, fix it, rerun it, or decide the workflow is not worth saving.

The dangerous kind is invisible success: the agent says the work is done, the output looks plausible, and nobody can reconstruct the path.

That is not an automation system. That is a vibes machine with write access.

Completion Is A Record-Keeping Problem

For small personal scripts, completion can be informal. A file appears in the right folder. A message gets posted. A table has new rows.

Agent workflows need a higher bar because they often cross boundaries. They move between local files, browser sessions, APIs, databases, chat threads, project trackers, cloud drives, and human approval steps. They also run in messy environments where part of the system can fail for reasons that have nothing to do with the task itself.

That means "done" has to include record keeping.

A useful agent run should answer four questions:

  • What was requested?
  • What did the agent actually touch?
  • What was verified?
  • What remains uncertain or blocked?

This does not require a giant governance program. It requires a habit of leaving durable artifacts in the places future work will look first.

For a local automation workflow, that might mean a status log in the repo, a generated report with a timestamped filename, a tracker row, a project-management comment, and a short sanitized completion note. For a reporting workflow, it might mean a source ledger, a public-record trail, a Zotero collection, and a note that distinguishes what the records show from what they do not prove. For a recurring operations workflow, it might mean queue state, run IDs, validation output, and an explicit blocker string when the environment gets in the way.

The exact shape varies. The principle does not.

If the next run cannot tell what happened in the previous run, the automation is not mature yet.

The Trail Should Be Boring

Good audit trails are not dramatic.

They are boring in the useful way. They use predictable filenames. They record timestamps. They separate inputs from outputs. They keep logs close to the workflow. They write summaries that say what happened without dumping private tokens, raw environment files, or sensitive account details. They make it obvious which step was skipped, retried, verified, or blocked.

The best trail is one a tired person can understand.

That matters because agent workflows often fail at the edges:

  • the source changed
  • the browser session expired
  • the API returned a partial result
  • a local service was not running
  • a file synced slowly
  • a path moved between cloud-storage representations
  • a command failed before reaching the real application layer
  • the agent had permission to draft but not to send

In those moments, the trail becomes the difference between a clean retry and a forensic dig.

I want the next agent to be able to read the run state and say, "This step completed. This artifact was verified. This account-touching action was not taken. This is the narrow blocker. Here is the safest next move."

That is a very different bar from "the model produced a good answer."

A Practical Checklist

When I am turning a recurring task into an agent-friendly workflow, I now think in terms of evidence before elegance.

The checklist is simple:

  1. Name the run.

    Give the workflow a stable task name, timestamp, or run ID. If multiple files are produced, they should clearly belong to the same run family.

  2. Preserve the request.

    Keep the original task brief or acceptance criteria somewhere close to the output. Future review should not depend on chat memory alone.

  3. Separate inputs, outputs, and logs.

    Do not let generated files, raw source material, and diagnostic logs blur together. A clean folder structure beats cleverness.

  4. Verify the artifact.

    If the agent creates a report, packet, tracker update, or publishable file, run the narrowest verification that proves it exists and is well formed. For code, build or type-check when practical. For data, read back counts or schema. For documents, confirm the file was created and can be opened or parsed.

  5. Write the blocker as data.

    "It failed" is not enough. A useful blocker names the exact step, the narrow error class, and whether the failure happened before or after a real application response.

  6. Record what was intentionally not done.

    This is especially important around sending, publishing, deleting, spending money, modifying accounts, or touching production. A good closeout says not only what happened, but which boundary was respected.

  7. Leave a human-readable summary.

    A short summary in a status log, tracker, or project note is often the cheapest way to make future work sane.

This is not bureaucracy for its own sake. It is how automation becomes resumable.

Agents Need Handoffs More Than Humans Do

Humans carry a lot of state in our heads. That is useful, but it is also why our personal systems get weird. We remember why a file is named strangely. We know which Slack message mattered. We know that a missing row means "waiting on approval" rather than "forgotten."

Agents do not get that luxury unless we externalize the state.

A future agent can inspect files, logs, tracker rows, and summaries. It can follow stable conventions. It can run a verification command. It can compare a generated artifact against a manifest. But if the only evidence is that a previous chat sounded confident, the next run starts on sand.

That is why the handoff is the product.

A good agent workflow is not just a prompt. It is a set of surfaces where state can live:

  • a repo for code and durable docs
  • a status log for progress and verification
  • a tracker for human-facing task state
  • a source ledger for research claims
  • a queue for pending and completed work
  • a sanitized notification channel for fast awareness
  • a memory note for reusable lessons

Not every workflow needs all of those. Most should not. But every serious workflow needs at least one durable place where the truth goes after the chat moves on.

The Public Version Of Private Work

There is a natural tension here.

The most useful agent trails often contain things that should not be public: account IDs, private file paths, source names, unpublished leads, internal notes, API errors, drafts, revenue numbers, or approval boundaries. Publishing the raw trail would be careless.

But the pattern can be public.

That is where I think a public automation lab is useful. It can show the cleaned method without exposing the private substrate. It can say: here is the shape of the workflow, here is the checklist, here is how verification works, here is what belongs in a closeout, here is how to keep an agent from confusing a browser failure with an application failure.

That kind of writing is less flashy than a demo video. It is also more honest.

Real automation is not a magic trick. It is an operating practice.

The Standard I Want

The standard I want for agentic work is not perfection. It is recoverability.

If a run succeeds, I should be able to prove what succeeded.

If a run fails, I should be able to see where and why.

If a run is partial, I should be able to resume without guessing.

If a run crosses a sensitive boundary, I should be able to show that the boundary was respected.

If a future agent picks up the work, it should inherit more than a vague instruction. It should inherit a map.

That is the difference between agent output and agent infrastructure.

Output is what the model gives you.

Infrastructure is what lets the next run trust it.

The future of AI work will not be decided only by which model can perform the longest task in one shot. It will also be decided by which workflows can leave behind enough state for humans and agents to cooperate over time.

The glamorous version of agents is autonomy.

The useful version is accountable continuity.

And accountable continuity starts with a simple rule: the run is not done until it leaves a trail.