Projects About

6 Reads, Zero Blocking Issues: Running Compliance QC on AI-Generated Dental Ad Research

Six tool calls. Zero code changes. That’s the entire session.

And yet it’s worth logging — because “nothing went wrong” and “nothing happened” are not the same thing.

TL;DR — Used Claude Opus 4.7 to QC six dental ad research files generated by an automated Naver SERP tracking pipeline. A single focused prompt checked for hospital data leakage (a hard compliance requirement under Korean medical advertising law), contradictions, missing labels, and unsupported claims. All six files passed. Zero blocking issues.

The Constraint That Shapes the Whole Pipeline

One exposed hospital name in an HTML report is a medical law violation in Korea.

Korean medical advertising law (의료법) prohibits specific clinic identification in certain research and advertising contexts. Running an automated pipeline that generates daily SERP analysis means this constraint needs to be enforced systematically — not eyeballed. The pipeline architecture enforces it upstream; QC verifies that the architecture held.

This isn’t theoretical risk. It’s the sharpest constraint in the system.

What Gets Reviewed Each Day

The pipeline tracks Naver Place, blog rankings, and SERP patterns for dental keywords. Each day it generates several research files. Today’s batch:

FilePurpose
2026-05-14-daily-update.mdFresh SERP observations for the day
rolling-knowledge-base.mdAccumulated patterns across weeks
source-index.mdSource attribution for all claims
competitive-serp-observations.mdCompetitor SERP pattern tracking
naver-ranking-hypotheses.mdRanking mechanism hypotheses
2026-05-14-place-ad-application-day-serp-pattern.htmlHTML summary report

The HTML report carries the highest risk — it’s the output most likely to be shared outside the pipeline. The others are internal working documents. QC checks all six, but the HTML gets the most scrutiny.

Four criteria determine pass or fail:

  1. Contradictions — does any data conflict with prior observations without being flagged?
  2. Missing required labels — are all entries properly sourced or tagged?
  3. Unsupported claims — is every assertion backed by observation data?
  4. Hospital/address leakage — does the HTML report contain specific clinic identifiers?

The Prompt That Did the Work

I handed Claude Opus 4.7 the file paths with a tight instruction:

Read the updated daily research files for 2026-05-14 and review for blocking issues only:
contradictions, missing required labels, unsupported claims,
or specific hospital/address leakage in the HTML report.
Return OK if no blocking issues, otherwise list exact fixes.

“Blocking issues only” is load-bearing here. Without it, the model generates improvement suggestions, style notes, and recommendations. Those are useful in a review session — but this isn’t a review session. This is a pass/fail gate. The output needs to be: OK, or a list of specific things to fix.

When you leave a QC prompt open-ended, you end up triaging the model’s output instead of making a ship decision. Tight prompts produce tight outputs. The model should help you make a call, not add more things to read.

“List exact fixes” is the other key phrase. Not “areas for improvement.” Not “consider reviewing.” If there’s an issue, I need a file path and a specific correction. Vague feedback at a QC gate is functionally the same as no feedback.

Six Read calls, all six files, nothing else. No exploratory reads, no tangential lookups. The session stayed exactly within the declared scope.

What the Verification Found

Hospital leakage — clean.

The HTML report uses keyword-level identifiers throughout. Every clinic reference appears as part of a search query label. No registered entity names, no addresses, no doctor identifiers.

Claude specifically checked the boundary case: geographic terms (Gangnam, Cheongdam, Seocho) combined with specialty terms (dental, implant, laminate, whitening). These can look superficially similar to shortened clinic names in abbreviated form. All confirmed as query strings, not entity references.

The pipeline architecture is why this check consistently passes: from the data collection step, the system is designed to capture and store keyword-level labels rather than entity-level data. The constraint is enforced upstream, not just caught in QC. QC is verification that the upstream constraint held.

Contradictions — one interesting finding, not a blocker.

Across 10 test samples, the Cheongdam laminate query returned zero Naver Place results. The same query, checked against external platform data, showed six clinic appearances.

This looks like a contradiction on the surface. It isn’t. Naver Place ad visibility and external platform visibility are decoupled — they’re influenced by different signals and tracked through different mechanisms. The daily update logged this gap as an observation, not as a uniform result. The rolling knowledge base has a hypothesis section covering exactly this: Place ad exposure and external platform exposure move independently.

The verification confirmed the files handle this correctly. The gap is logged as a SERP pattern observation, cross-referenced against the independence hypothesis in the knowledge base. Not a contradiction — data that supports an existing hypothesis. This is the kind of nuance that makes a good QC check valuable: it has to distinguish between a real inconsistency and a documented observation.

Label coverage — clean.

Every observation entry in the daily update and competitive observations files carries either a source link or an observation session ID. The source index maps session IDs to session metadata. Full traceability chain intact.

Unsupported claims — clean.

All hypothesis-level statements in naver-ranking-hypotheses.md are labeled with a confidence tier (low/medium/high). No statements framed as established facts without supporting observation records. This is enforced at generation time — the pipeline output format requires a confidence field on every hypothesis statement. That makes the QC check fast: missing confidence labels are structurally absent, not just semantically absent.

Tool Footprint: Minimal by Design

ToolCount
Read6
Total6

No Edit. No Write. No Bash. Pure verification.

Six tool calls for six files. No exploratory reads, no tangential lookups. When a QC session uses more tool calls than there are files under review, it usually signals one of two things: the files needed fixing (which generates Edit calls), or verification scope crept (which generates extra Read calls for related files). Neither happened here.

Tool count matching declared scope is a signal the session ran cleanly. It’s a useful secondary metric: if a QC session consistently requires more tool calls than expected, the process — or the prompt — needs adjustment.

Building Compliance into the Pipeline Architecture

The reason the hospital leakage check passes consistently isn’t just that QC catches problems — it’s that the pipeline prevents them upstream.

Data collection stores keyword-level labels, not entity-level data. The research file format enforces source attribution on every observation. Hypothesis statements have a required confidence field that forces explicit labeling at generation time. These structural constraints mean QC is verifying that the architecture held, not hunting for problems that slipped through.

This is a meaningful distinction. When QC becomes a bug hunt, it signals structural issues in the pipeline that will keep generating problems. When QC is architectural verification, a pass is meaningful signal: the system is working as designed.

The implication: invest in upstream constraints, not just downstream checks. If you’re consistently finding the same class of problem in QC, the fix is in the data model or the generation prompt — not in making the QC check more aggressive.

Why Opus for This Gate

The model tier decision in this pipeline isn’t about general quality — it’s about where the cost of a miss is highest.

Sonnet handles most of the generation work: drafting SERP summaries, updating the knowledge base, formatting the HTML report. It’s fast, capable, and generation quality issues are recoverable — a draft that needs editing is a fixable problem.

The QC gate is different. The judgment call here: is “강남 치과” in this context a search keyword or part of a clinic name that slipped through? That distinction is contextual, subtle, and getting it wrong is a compliance issue — not a content quality issue. Compliance issues are not recoverable in the same way. Opus handles that class of nuanced contextual judgment more reliably, and the cost difference between models doesn’t factor into a decision this asymmetric.

The principle generalizes: match model tier to consequence tier. Generation errors are correctable in the next iteration. Compliance misses are not. Spend on the most capable model where the asymmetry in consequences is largest; optimize cost everywhere else.

Any automated pipeline has stages where accuracy matters more than speed. The right engineering question is: which stages are those? Identify them explicitly, then don’t optimize cost at those stages. Cost optimization elsewhere more than makes up for it.

Why Log a Session That Changed Nothing

Six tool calls. Zero file modifications. No code written, no bugs fixed, no features shipped. Sessions like this don’t look like they belong in a build log.

But “we didn’t do anything” and “we confirmed there’s nothing wrong” are meaningfully different. One is absence of activity. The other is a verified state.

In automated pipelines, verified states matter for a specific reason: they establish baselines. When something breaks — and it will — the question is always “when did this start?” A dated, logged QC pass lets you answer that question. Without the log, you’re guessing at history. The pipeline ran on 2026-05-14. The files were read. The criteria were checked. That’s worth recording.

There’s a second use case: QC pass logs show that pipeline structural constraints are holding across time. If hospital leakage appears in a future QC session, the pass history shows exactly when it started — which points directly at what changed in the pipeline between the last pass and the first fail. That’s a precise starting point for debugging, not a wide-open search.

Implementation logs tell you what was built. Verification logs tell you what was working and when. Both matter in a system you intend to maintain. In an automated research pipeline that runs daily, the verification history is as important as the commit history.

Result: OK. Report ships tomorrow.


More projects and build logs at jidonglab.com

Comments 0

0 / 1000