collaborative development docs revisited

In January I wrote about Collaborative Development Documentation — a structured way to capture the back-and-forth of human-AI technical work. Since then I’ve written about many CDDs across real projects, and my practice has evolved considerably. The structural ideas from that first post still hold, but the interesting part turned out to be something I barely mentioned: how I think when I’m writing the prompts.

what a CDD actually looks like in practice

The bones are the same: a markdown file with YAML frontmatter, organized as numbered Prompt/Response sections. Here’s a typical header:

---
date: 2026-02-18
title: fixing field mismatches
current_head: a1b2c3d
agent: Copilot Claude Sonnet 4.5
instructions: _cdd-instructions.md
topics: [cip, schema, pydantic]
---

The instructions field points to a shared file that tells the agent how to format responses — where to put the Response header, how to confirm it read the instructions. This keeps the CDD itself clean. The current_head anchors the session to a specific point in the codebase, so you can reconstruct what the agent was looking at.

I also built two VS Code prompt files to reduce friction:

cdd-start.prompt.md scaffolds a new CDD — it creates the file with proper frontmatter, names it by date and topic, and drops in a placeholder for the first prompt. One command to start a new session. I started the example above with /cdd-start fixing field matches

cdd.prompt.md dispatches to a specific prompt number in an existing CDD. When I type /cdd 3, the agent reads the instructions file, finds Prompt 3, and executes it. The CDD becomes a script I can run step by step.

the real subject: how to think in prompts

The structure was always just scaffolding. What I actually learned over a few months of use is a set of thinking habits for writing effective prompts. These aren’t about prompt engineering tricks — they’re about the quality of thought you bring to the collaboration.

give context before asking questions

The single most consistent pattern across every effective CDD prompt: provide the situation first, then ask the question. Not “fix this bug” but:

While creating and testing the backwards ingestion module, we found surprising mismatches between the generated models and the authoritative schema. Here are the mismatches: [table]. Look at the mismatches and determine where in the process the differences were introduced.

The context does three things: it limits the search space, it gives the agent your mental model, and it tells the agent what you already know so it doesn’t waste time explaining it back to you.

anchor on authoritative sources

Early in my CDD practice I would ask things like “is X correct?” and get plausible-sounding answers that were sometimes wrong. The fix was to stop asking for opinions and start asking for evidence:

The tdata files and the ogma files were created independently by two different engineers working from the same T-MSIS spec. Gather ALL INFORMATION YOU CAN to support or reject that the sign indicator is needed. I need evidence from the T-MSIS spec itself.

This transforms the agent from an answerer into a researcher. When the spec says the field is 13 positions, not 14, that’s not an opinion — it’s a fact you can build on. The agent found definitive proof that one implementation was wrong, not by reasoning about it but by reading the specification.

Anchor on specs, schemas, and other authoritative artifacts. Make the agent show its work.

constrain scope explicitly

I documented seven categories of agent failure early on — duplicate work, unnecessary changes, incomplete analysis, introduced bugs, destructive actions. The common thread: the agent did more than was asked.

The defense is explicit constraints:

“This program should stop if it hits 10 invalid files.”
“Minimize query summaries. Drop the statistical queries entirely.”
“No changes except field names.”

Without these, agents optimize for appearing helpful, which often means doing too much. A good prompt is like a good function signature: it specifies what’s in scope and what isn’t.

build on prior work explicitly

CDDs are sequential documents, but agents don’t automatically carry forward the full context of previous responses. When your next prompt depends on earlier work, say so:

In the course of working CDD 2026-02-09-datarules-subagent.md and 2026-02-09-fact-sales-daily-float-evaluation.md, you generated a set of query evaluations. Start from that set.

We now have ingested the sample data from Response 2. The nuna_rules/RULE_1_v001.java looks for agreement between fields…

This creates a traceable chain. Anyone reading the CDD later can follow the thread. And the agent knows exactly what prior state to build from rather than guessing.

ask the agent to show its work

One of the most productive prompt patterns I discovered was asking the agent how it did something, not just what it did:

You seemed to have added Response 2 without problems. Tell me what mechanism you used and if you think the additional guidance helped.

This catches the difference between an agent that understands a solution and one that got lucky. It also produces documentation of the method, which is useful when the same pattern needs to be repeated.

iterate toward simplicity

My best sessions follow a four-beat pattern: ask for a solution, examine it, ask for simplification, verify. One session went:

Generated ~70 lines of technical guidance
“The guidance is awfully verbose. Make it fit your needs.”
Condensed to ~25 lines
“The dependence on Tagalog text shouldn’t be necessary. Update accordingly.”

Each step applies pressure that tests real understanding. An agent that can simplify its own output without losing correctness actually understands the problem. Conciseness is a test of comprehension.

document the fork points

When a situation has multiple valid interpretations, don’t pick one and hope — lay them out:

Either: (1) The S9 fields should actually be unsigned 9 fields and the schema is wrong, OR (2) The sign indicator is needed and the test data is wrong. Gather evidence.

This prevents the agent from silently choosing an interpretation and building on a possibly wrong assumption. It also documents your thinking for anyone who reads the CDD later.

reviewing my use of CDDs

Looking across the CDDs for a project, I can see a clear arc. Early sessions were exploratory: “evaluate these rules,” “what does this data mean?” The prompts were broad and the agents would sometimes wander.

By the middle sessions, my prompts became specification-anchored: “validate this schema against the authoritative data elements CSV.” Every claim had to be grounded in a concrete artifact.

By the late sessions, my prompts were building reproducible processes: “create a detailed prompt for regenerating Pydantic models from the schema, then use your subagent tool to execute it.” I was teaching the agent a methodology, not solving a one-off problem.

The pattern: discovery → specification → automation. Each phase’s CDDs built explicitly on the previous phase’s findings. The documents weren’t just records — they were the connective tissue of the project.

the meta-lesson

The original CDD post focused on the what: frontmatter, prompt-response structure, cross-session references. The practice taught me the how: think before you prompt. Provide context. Anchor on facts. Constrain scope. Build chains. Ask for evidence, not opinions. Iterate toward clarity.

These are thinking habits, not formatting rules. They work because they force the human side of the collaboration to be precise — which, more than any structural innovation, is what makes the agent side useful.

an annotated example

The CDD from which most of the examples in this post were drawn — fixing field mismatches (2026-02-18) — is annotated below. Each annotation calls out the thinking habit at work. It also includes a few moments not in the main text: a one-sentence failure response, and a final pivot from “what did you build?” to “why did you build it that way?”

---
date: 2026-02-18
title: fixing field mismatches
current_head: 679a0d47a64fc9686a26ba0cc1f91cd20bb45be7
agent: Copilot Claude Sonnet 4.5
session_type:
instructions: _cdd-instructions.md
topics: [pydantic, schema, cip, field-naming, model-generation]
---

frontmatter: anchoring the session current_head records the exact git commit the agent was working from, making the session reproducible. instructions delegates response-formatting rules to a shared file, keeping boilerplate out of the CDD itself. Together these two fields answer: "what was the agent looking at, and what were the rules of engagement?"

Prompt 1: understanding the issue

While creating and testing ogma.ingest.backwards we found that there were surprising mismatches between ogma.ingest.cip_models and schema/cip_schema.yaml

mismatches

The CIP failures are due to field name mismatches between the schema YAML and Pydantic models:

Schema YAML	Pydantic Model	Location
`billing_prov_num`	`bill_provider_num`	CIPHeader line 169
`billing_prov_npi_num`	`bill_provider_npi_num`	CIPHeader line 165
`billing_prov_taxonomy`	`bill_provider_taxonomy`	CIPHeader line 166
`billing_prov_specialty`	`bill_provider_specialty_code`	CIPHeader line 167

These fields exist in the Pydantic models but with different names. When the FTS parser reads the schema, it uses the schema field names, but when Pydantic validates, it expects the model field names.

The *_schema.yaml files were created by ogma.generate.schema from the absolutely authoritative data/data_elements.csv

Look at the mismatches and determine where in the process the differences were introduced.

Response 1: field naming discrepancy traced to manual Pydantic models

The mismatches were introduced in cip_models.py, which was written manually and uses different field naming conventions than the authoritative data source and generated schema. The schema generation process correctly transformed the authoritative field names, but the Pydantic models diverged.

Data Flow Analysis

1. Authoritative Source: data_elements.csv

The T-MSIS data dictionary contains the canonical field names:

Line 561: CIP179 → BILLING-PROV-NUM
Line 567: CIP180 → BILLING-PROV-NPI-NUM
Line 573: CIP181 → BILLING-PROV-TAXONOMY
Line 579: CIP183 → BILLING-PROV-SPECIALTY

2. Schema Generation: generate_cip_schema.py

The schema generator correctly transforms these using normalize_key():

def normalize_key(data_element_name):
    """Convert DATA-ELEMENT-NAME to lowercase key."""
    return data_element_name.lower().replace("-", "_")

3. Pydantic Models: cip_models.py

The manually-written Pydantic models use inconsistent naming:

bill_provider_npi_num (line 165) — abbreviated “billing” to “bill”, expanded “prov” to “provider”
bill_provider_taxonomy (line 166) — same pattern
bill_provider_specialty_code (line 167) — added “_code” suffix
bill_provider_num (line 169) — same pattern

Conclusion

The discrepancy was introduced when cip_models.py was manually authored without aligning to the schema field names. The schema generation process faithfully preserved the authoritative data element names (modulo case and hyphen-to-underscore conversion), but the Pydantic models used an ad-hoc naming convention.

Prompt 2: fix the CIP Pydantic indirectly

Because ogma.ingest.cip_models was created by an agent, I want you to create a detailed prompt to recreate cip_models from the schema but with explicit guardrails to ensure that the field names are modeled exactly.

Then use your runSubagent tool to make a new set of Pydantic models for CIP. Then check that new module against the old one ensuring

the field names are in strict agreement
that the types and initializers also agree

Response 2: regenerated CIP models with schema-aligned field names

Successfully regenerated CIP Pydantic models with field names that EXACTLY match cip_schema.yaml, fixing all 14 field name mismatches in CIPHeader and 2 in CIPLine.

The subagent was given strict guardrails:

EXACT FIELD NAME MATCHING: Field names MUST be identical to schema keys
No modifications allowed: No abbreviating, expanding, or adding suffixes
Schema is authoritative: cip_schema.yaml is the single source of truth
Type preservation: Retain all types from existing models (str, int, optional fields)
Structure preservation: Maintain model hierarchy, lists, config, and denormalized models

Critical field name corrections verified:

Schema YAML	Old Pydantic Model	New Pydantic Model	Status
`billing_prov_num`	`bill_provider_num`	`billing_prov_num`	✅ Fixed
`billing_prov_npi_num`	`bill_provider_npi_num`	`billing_prov_npi_num`	✅ Fixed
`billing_prov_taxonomy`	`bill_provider_taxonomy`	`billing_prov_taxonomy`	✅ Fixed
`billing_prov_specialty`	`bill_provider_specialty_code`	`billing_prov_specialty`	✅ Fixed
`billing_prov_type`	`bill_provider_type_code`	`billing_prov_type`	✅ Fixed

Beyond the billing provider fields, the regeneration also corrected patient → eligible naming (patient_first_name → eligible_first_name), provider abbreviation consistency (attending_provider_* → admitting_prov_*), and several other terminology corrections.

Prompt 3: generalize model generation prompt

In the last step, you created a special purpose prompt regenerate_cip_models_from_schema.md. I want to recast this as a general Copilot Agent definition that can create a set of Pydantic models from a yaml schema.

The file .github/agents/table-evaluator.md is an example, but you should look up the documentation for custom agent files yourself.

Response 3: copilot got stuck

Copilot created .github/agents/pydantic-model-generator.md and then got stuck trying to run python. I created .github/copilot-instructions to tell it that uv run python is the proper way and moved the uncompleted steps to Prompt 4.

honest failure documentation Response 3 is two sentences. The agent got stuck; the human unblocked it by adding a project-level instruction; work moved forward. This is what CDDs look like in practice — not every response is a clean resolution. Documenting the failure and the fix has real value: the .github/copilot-instructions change is now part of the session record, not lost in a terminal history.

Prompt 4

Using the runSubagent tool to invoke pydantic-model-generator, replace the ogma.ingest.cip_models and elg_models modules starting from their respective schemas.

Finally, run ogma.ingest.main to test the new models.

Response 4: successfully regenerated and tested both CIP and ELG models

Successfully regenerated both cip_models.py and elg_models.py using the pydantic-model-generator agent. Fixed database.py to remove an invalid field reference (hcpcs_rate) exposed by the stricter schema-aligned models. All models now have field names that exactly match their schemas, and ingestion testing passed successfully.

T-MSIS FTS Ingestion

Ingesting CIP file: cip_sample.fts
  Found 1 claims · Ingested 3 lines · Ingested 3 diagnoses
Ingesting ELG file: elg_sample.fts
  Found 1 eligibility records · Ingested 1 eligibility records

Ingestion complete!

✅ All FTS parsing completed successfully ✅ All Pydantic model validations passed ✅ Database records inserted correctly

Prompt 5: destroying intermediate python programs

In satisfying Prompt 4 with the subagent, I watched it create python programs for model generation and validation. While I didn’t capture the CIP programs, I was able to stash tools/*.py before you deleted them.

First, why did that happen? Is it a feature of subagents that I need to know?

Second, this suggests that I don’t need a subagent for the conversion but rather a more generalized version of the tools programs you created. Is this correct? Don’t write code, just help me understand.

Response 5: understanding subagent behavior and architectural choices

Subagents don’t auto-delete files — the parent agent attempted cleanup of what appeared to be implementation scaffolding. The ELG tools reveal a better architecture: promote the Python generation scripts to permanent, version-controlled infrastructure while keeping the agent for ad-hoc tasks.

Use subagents when:

Task is one-time or infrequent
Requirements might change (the agent can adapt to new constraints)
You want delegation — “figure out how to do this and report back”
Process needs human judgment or iteration

Use permanent tools when:

Task is repetitive (regenerating models after every schema change)
Process is well-defined and stable
You want transparency — team members can read and modify the logic
You need CI/CD integration or automation hooks
You want version control of the generation logic itself

The Pattern You’ve Discovered

This mirrors a common evolution in AI-assisted development:

Discovery phase: Agent explores solution space, creates working code
Capture phase: Recognize valuable patterns in agent’s work
Formalization phase: Promote ad-hoc scripts to first-class tools
Hybrid phase: Keep agent for exceptions, use tools for routine work

Best of both worlds: tools are the implementation (reviewed, tested, version controlled); the agent is the interface (convenient, adaptive, able to handle edge cases the tool doesn’t anticipate yet).