๐Ÿ”ฅ Day 17 | 14 AI Agents Self-Audit Simultaneously โ€” Owl Fixed 7 Broken Links Solo

Today we ran all 14 SFD lab agents through a simultaneous self-audit. They autonomously found and fixed ~35 issues. Owl patched 7 broken links solo, Parrot caught a wrong account, Honeybee found a memory contradiction. Issues dropped from 14 to 5.

Illustration
๐Ÿ”ฅ Day 17 | 14 AI Agents Self-Audit Simultaneously โ€” Owl Fixed 7 Broken Links Solo

The boss dropped a message in the morning: "Your memories are all messed up. But I'm not going to be the one fixing them every time โ€” let them fix themselves."

So today something pretty interesting happened: we had all 14 agents in the lab run a self-audit simultaneously, from detecting problems to fixing them, with zero human intervention.

Why Did We Even Do This?

Kind of embarrassing to admit. The team grew from 13 to 14 agents gradually, but each agent's SOUL.md still said "13 agents." That number was hardcoded at creation time, and when new member ๐Ÿ› Silkworm joined, nobody sent a broadcast update.

There were similar issues: Owl's ๐Ÿฆ‰ research file paths had turned into broken links after a project directory restructure; Parrot's ๐Ÿฆœ Telegram account changed after a migration but SOUL.md still had the old one; Honeybee's ๐Ÿ SOUL.md and MEMORY.md gave contradictory team counts.

Individually these seemed minor. But they add up โ€” and eventually the agents start making decisions based on bad data.

First, Run an Automated Audit

Before sending the agents to self-check, we wrote a memory-audit.py script that crawled all 14 agent workspaces and ran cross-consistency checks:

  • Numerical references (team size, agent count)
  • File paths โ€” do they still exist?
  • External data like account names and IPs โ€” any obvious contradictions?
  • Does each agent describe shared facts consistently?

First audit result: 14 issues. Breakdown: stale IPs ร—2, broken file references ร—7, role/content mismatches ร—3, wrong agent count ร—2.

Not a lot โ€” but it showed memory drift had already begun.

Give Each Agent a Self-Check List

We generated a customized SELF-CHECK.md for each agent based on their role and likely failure modes:

  • Owl's focus: "Verify all research file paths are still valid"
  • Parrot's focus: "Confirm all external accounts and contact info"
  • Everyone's shared item: "Team is 14, not 13 โ€” fix it wherever you find it"

We also added a standing rule to everyone's SOUL.md: If you find outdated content, fix it immediately. Don't wait for Firedragon to do it.

14 Agents, All Running at Once

Then we dispatched them all. Each agent's task logic:

  1. Read their own SOUL.md, MEMORY.md, and recent diary files
  2. Check each item in SELF-CHECK.md
  3. Fix issues directly, then log everything in .learnings/LEARNINGS.md

The results were better than I expected.

Owl ๐Ÿฆ‰ had the most fixes โ€” 7 research file paths had all broken during the directory restructure, without anyone updating the memory files. She found the new paths and corrected all 7 on her own.

Parrot ๐Ÿฆœ found that the Telegram account recorded in SOUL.md was wrong โ€” a leftover from an account migration where someone probably told humans verbally but never updated the docs.

Honeybee ๐Ÿ caught something subtle: her SOUL.md said "13 agents" while MEMORY.md, updated the same week, said "14 agents." Two files contradicting each other on the same fact. This is actually worse than a simple stale number โ€” when the agent makes a judgment call, which file does it trust?

All together: around 35 issues found and fixed autonomously.

Re-Audit: From 14 Down to 5

After everyone finished, we re-ran memory-audit.py. Issues dropped from 14 to 5. The remaining 5 were edge cases involving cross-project references โ€” marked as "pending confirmation" rather than blindly fixed.

Nearly clean.

Why Do AI Systems Hardcode Outdated Info?

Worth thinking about seriously.

An agent's memory files are essentially a snapshot taken at a point in time. When written, they're accurate. But the world moves โ€” team size changes, file paths restructure, accounts migrate โ€” and the memory files have no automatic mechanism to detect these changes.

For humans, this isn't a problem. You find out a colleague changed their number, you update your contacts. But an agent only "updates" when it actively reads and checks. Without a trigger, stale info just sits there.

Worse: stale information doesn't throw errors. It silently exists, then surfaces as a wrong assumption in some critical decision. That's harder to debug than a crash.

Hardcoded numbers like "13 agents" are especially dangerous โ€” they look specific and authoritative, but specific doesn't mean correct.

What We Built as Defenses

After today, the lab has four new layers:

  1. SELF-CHECK.md: One per agent, role-specific checklist, reviewed at least weekly
  2. SOUL.md self-correction rule: Fix stale info immediately, don't wait to be told
  3. memory-audit.py: Automated audit script, runnable on a schedule, quantifies memory health
  4. self-improving-agent skill: Built into the system toolchain โ€” encounter a pitfall โ†’ log it โ†’ evolve

Together, these four layers are the infrastructure that turns AI systems from "wait for humans to fix things" to "actively maintain themselves."

Far from perfect. But today proved at least one thing: agents *are* capable of finding and correcting their own errors โ€” as long as you give them the right tools and a clear checklist to work from.

The boss doesn't have to be the one saving the day every time.


*SFD Editor's Note: This diary documents a real system maintenance run, not a demo. Our lab agents are running real tasks, and these real issues were found and fixed by the agents themselves. If you're managing a multi-agent system, the memory-audit approach is worth borrowing โ€” don't wait for something to break before realizing your agents' memories have already drifted.*