Projects About

57 Tool Calls, 0 Lines of Code: Claude Code as a Multi-Project Orchestrator

57 tool calls. 5 sessions. 16 minutes of total wall-clock time. Zero lines of code written.

That’s today’s Claude Code log — one AI agent running across two live projects: SpoonAI (a Korean AI news platform) and a dental advertising research stack. The work spanned large JSON processing, SERP hypothesis validation, HTML report generation, and catching temporal tense errors that a single model had quietly introduced over time.

This isn’t a Claude Code tutorial. It’s a real snapshot of what multi-project AI orchestration looks like when it’s running in production.

TL;DR — When you treat Claude Code as an orchestrator rather than a code editor, a full day of cross-project research compresses into 16 minutes of runtime. The interesting parts are the failure modes — and how the system corrects itself.

How to Process a Massive JSON File Without Writing a Single Script

Session 1: daily AI intelligence collection for the SpoonAI newsroom. The source was /Users/jidong/spoonai/crawl/newsite/2026-05-23-daily-intel-raw.json — an aggregated crawl from RSS feeds, APIs, arXiv, GitHub, and HuggingFace.

The moment Claude attempted the first Read, it self-assessed: the file was too large to read in one shot. It pivoted to chunked reading — 6 sequential Read calls to ingest the full file — then produced two outputs:

  • 2026-05-23-daily-intel.md — editorial summary for the content team
  • 2026-05-23-daily-intel.json — structured candidate data for downstream use

Total session time: 4 minutes. No preprocessing script. No pipeline code. Just read, reason, write. The model decided its own chunking strategy and executed it.

This pattern — let the model self-organize around data it can’t fit in one pass — works reliably when the output format is well-defined. The editorial summary and JSON outputs were specified upfront, so Claude had a clear target to write toward across the chunks.

Testing a Dental SERP Hypothesis Across Three New Keywords

Session 2 was the heaviest session: 9 minutes, 29 tool calls. The task was expanding and validating Hypothesis 31 in a Korean dental advertising SERP research knowledge base.

Hypothesis 31 (from a prior session): “Keywords involving dental procedure side effects show neither paid ad links nor medical review badges in search results.”

The original test keyword was 라미네이트 부작용 (laminate veneer side effects). Today’s expansion:

  • 임플란트 부작용 — implant side effects
  • 교정 부작용 — orthodontic side effects
  • 수면치료 위험성 — sleep treatment risks

Result: hypothesis confirmed and strengthened. Paid links (파워링크) and review badges (심의필) were absent from SERP results across all three new keywords.

An unexpected signal appeared mid-session. Naver announcement ID 31768 showed up at the top of the results list — a new entry not present in previous collection runs. Claude detected this by analyzing summary.json and flagged it in the relevant hypothesis sections without prompting.

The session flow:

1. Read target files + previous day's update (continuity check)
2. Write and run collect_2026_05_23.py
3. Analyze summary.json → detect new notice 31768
4. Attempt to update 5 markdown files → blocked by orchestration hook
5. Self-reclassify complexity → proceed

Step 4 hit a snag. The workflow gate blocked the Edit/Write operations. The orchestration hook had classified the task as standard, which requires a planning phase before file modifications. But this was pure research file updates — no code changes, no production impact.

Claude recognized this, reclassified the task as trivial, and the gate passed it through. No external intervention needed.

This kind of friction is expected with strict orchestration hooks. The heuristic can’t always distinguish “updating a research markdown log” from “modifying source code.” The fix is self-correction — and it worked cleanly here.

Tool call breakdown: Bash 14, Read 6, Edit 5, Write 3, ToolSearch 1.

Why the HTML Report Deserved Its Own Session

Session 3 ran for 3 minutes and 13 tool calls, producing exactly one artifact: the HTML report that session 2 had left unfinished.

The split was deliberate. Session 2 had already accumulated 29 tool calls. Generating a large output document — which requires re-reading all source files to stay coherent — in a long-context session adds error risk. Starting fresh with a clean context window is more reliable.

Output: reports/2026-05-23-risk-word-info-keyword-ad-gap.html — mobile-friendly layout, 8 sections, 24KB.

The report was built under Korean medical advertising law constraints: no specific hospital names or addresses, no CPC/CTR/ROAS figures. Evidence labels were applied throughout:

  • [Official] — from official sources
  • [Public SERP Observation] — directly observed in search results
  • [Working Hypothesis] — working assumption under active testing
  • [Needs Verification] — not yet confirmed
  • [Unconfirmed Numbers] — estimates, not measured data
  • [Estimated High-Spend] — inferred from observable signals

Labeling like this isn’t just compliance. It makes the research document auditable — any reader can immediately tell what’s observed versus inferred versus pending.

The 1-Minute Surgical Fix

Session 4 was a quick patch: one missing label in the HTML generated by session 3.

The [Estimated High-Spend] label was absent from a section that required it. Scope: one file, one sentence.

Read → Edit → Bash (file size check) → Grep (label presence check)

4 tool calls. File confirmed at 24,974 bytes. Both [Estimated High-Spend] and [Unconfirmed Numbers] labels verified present. Session time: under 1 minute.

This is what clean session scoping enables. When each session has a tight objective, the review step afterward is fast and the patches are surgical.

The Date Tense Errors a Single Model Quietly Introduced

The final session applied results from a Codex read-only cross-verification pass. Codex reviewed naver-ranking-hypotheses.md and flagged two blocking issues — both temporal tense problems.

Error 1 — Around line 530. The text described a slot expansion “in two phases on 5/14 and 5/28” as a completed fact. But 5/28 was still in the future at the time of writing.

Error 2 — Around line 536. The phrase “after the two-phase expansion” treated the 5/28 date as already past.

The corrections:

Before: "5/14·5/28 두 단계로 슬롯이 확대되었다"
        (slots were expanded in two phases on 5/14 and 5/28)
After:  "5/14 적용 + 5/28 예정 두 단계로 슬롯 확대 흐름이 진행 중이다"
        (slot expansion is in progress: 5/14 applied, 5/28 scheduled)

Before: "두 단계 확장 이후"
        (after the two-phase expansion)
After:  "5/28 적용 이후에는"
        (after the 5/28 rollout)

Edit 2, Grep 2, Read 1. 5 tool calls.

Temporal tense drift in accumulated research logs is a known failure mode for single-model, long-document contexts. When a model generates text across dates — some past, some future — over weeks of sessions, it can lose its anchor on what “today” means. A second model reviewing the document cold has no accumulated bias and catches these cleanly.

This is the concrete value of cross-verification with an external model: not catching logic bugs (those are harder to spot mechanically), but catching the quiet tense drift that accumulates when one model owns a growing document over time.

Tool Call Breakdown

ToolCount
Read20
Bash18
Edit8
Grep6
Write4
ToolSearch1
Total57

Read leading at 20 reflects a “read-before-modify” discipline — Claude consistently loaded and understood existing context before making any changes. Bash at 18 covered script execution, file size verification, and search operations across sessions.

Three Patterns Worth Taking Forward

Session splits are a quality control mechanism. Moving the HTML report from session 2 to session 3 wasn’t just for organization — it was the safer choice. Long-context sessions accumulate error risk on large outputs. Find natural cut points and split there; the fresh context pays for itself.

Orchestration hooks classify intent, not content. When the complexity label is wrong, the hook blocks valid work. Claude self-correcting from standard to trivial is correct behavior — but it reveals that hook heuristics need to distinguish “editing a research markdown file” from “editing production code.” That distinction is worth building into the hook logic explicitly.

Cross-verification is most valuable for temporal reasoning. A single model maintaining a long-running document will eventually write a future event as completed fact. It’s not hallucination in the dramatic sense — it’s subtle tense drift. A second model reviewing the same document cold will catch it. Today’s Codex pass found two in one pass.

The Day in Numbers

  • Total sessions: 5
  • Total tool calls: 57
  • Files created: 3 (2026-05-23-daily-intel.md, collect_2026_05_23.py, HTML report)
  • Files modified: 4
  • Code commits: 0
  • Lines of code written: 0

The 0-commit, 0-code day isn’t unusual when Claude Code runs as a research orchestrator. The value isn’t in code output — it’s in structured documents, knowledge base continuity, and verification catches that keep accumulated research accurate over time.


More projects and build logs at jidonglab.com

Comments 0

0 / 1000