How do you run an APMP Color Review on an AI-drafted proposal?

What APMP Color Reviews actually are

APMP (Association of Proposal Management Professionals) defines a sequence of "color hats" for proposal reviews. The standard sequence:

Color	Timing (% complete)	Focus	Output
Blue	5-10%	Strategy validation	Win/no-bid decision
Pink	50-60%	Compliance + responsiveness	Punch list of gaps
Red	85-90%	Final compliance + scoring	Sign-off or rework
Gold	95-100%	Win theme + discriminator polish	Submit-ready

(Some firms also run Black for legal review and White Glove for graphic polish.)

The reviews are adversarial. Pink team reads the draft as a contracting officer. Red team reads it as the evaluation board. Gold team reads it as the customer.

How AI-drafted proposals break the standard sequence

When an LLM drafts the first 40% of a proposal:

Pink team finds fewer compliance gaps (the agent is good at hitting Section L instructions).
Red team finds more hallucinations (the agent invents specific past performance details, NAICS justifications, contracting officer names).
Gold team finds less differentiation (the agent's prose is bland by design — it averages industry tone).

The fix is to add an AI Pink review before standard Pink, and to weight Red toward hallucination detection.

The new sequence: AI Pink → Pink → Red → Gold

AI Pink (5-10% of the time, mandatory for any AI draft)

Run a 30-minute review focused entirely on:

Specific claims that need a source. Every dollar amount, contract number, agency name, contracting officer — read the source it came from. If the agent didn't cite, the claim is suspect.
Past performance attribution. Did the agent claim a contract you don't actually have? Cross-check against your CPARS records.
Generic prose flagged as "AI-feel". Phrases like "leveraging best-in-class capabilities" without a specific claim. These should be cut, not edited.
Plausibly-real but fabricated details. "Awarded to Booz Allen via PIID HC102814C0001" — is that a real PIID? Search USAspending. If the PIID doesn't exist, the entire paragraph is suspect.

This review is not optional. AI agents trained on federal contracting data hallucinate at a rate of 8-15% on specific identifiers (FAR clauses, PIIDs, CPARS ratings). Without AI Pink, those hallucinations propagate to submission.

Pink (50-60% complete, 60-90 minutes)

Standard APMP Pink — read against Section L and Section M. The compliance matrix is your guide. Every requirement must have a section that addresses it.

Red (85-90% complete, 4-6 hours)

Standard APMP Red, with one addition: a second hallucination pass. After the rewrites between Pink and Red, the agent (or human) may have introduced new specifics. Re-validate every dollar, date, and identifier added since Pink.

Gold (95-100% complete, 1-2 hours)

Standard APMP Gold — win themes, discriminators, executive summary. Specifically:

Are the win themes named explicitly (not just implied)?
Are the discriminators sourced (not just claimed)?
Does the executive summary preview every M factor in the order they're scored?

What our agent does for each color

For Pink:

reflect_and_critique({
  verdict: "patch",
  selfScore: 78,
  findings: [
    { kind: "uncited_claim", claim: "Booz Allen won the prior VA cloud contract", severity: "high" },
    { kind: "missed_step", stepId: "incumbent-check", label: "Did not verify on USAspending" }
  ],
  patchSummary: "Will pull award by PIID before final draft"
})

This is a structured Pink output. The agent self-flags anything that looks uncited. A human Pink team validates the self-flag (catches false positives) and looks for what the agent missed.

For Red, the same tool runs with a stricter rubric — severity: "high" for any uncited claim, no exceptions.

For Gold, the agent's reflection focuses on win themes and discriminators rather than compliance.

The 80/20 of AI-augmented Color Reviews

If you can only run two reviews (small firm, tight deadline):

AI Pink at 30% draft (mandatory — catches hallucinations early).
Red at 90% (combines Pink + Red checks).
Skip Gold; trust the agent's reflection if it scored ≥ 90% on win theme.

If you run three:

AI Pink (20-30% draft)
Standard Pink (50-60% draft)
Red (85-90% draft)

Skip Gold only if the executive summary already names the win themes and discriminators explicitly.

If you run all four (resource-rich, large pursuit):

Run AI Pink → Pink → Red → Gold as separate teams. Different reviewers per color. The compounding catch-rate is high enough to justify the investment for any pursuit > $5M total contract value.

Common pitfalls

Letting the agent's reflection substitute for human Red. The reflection is a Pink-equivalent self-check. Red must be a different reader.
Skipping AI Pink because "the agent looked good." Hallucination rates are silent. They show up in debriefs, not drafts.
Running Gold as a copy-edit pass. Gold is content review, not grammar. Send the draft to a copy-editor separately if needed.
No reviewer for the executive summary alone. It's the only section the contracting officer is guaranteed to read. Worth its own 30-minute pass.

A 2-hour total review for a 30-page proposal

30 min AI Pink (one experienced capture lead, hallucination focus)
60 min Red (one capture lead + one tech SME, scoring focus)
30 min Gold (one BD lead, win theme focus)

This compresses APMP doctrine into a small-team-friendly format without losing the essential checks.

Tooling

Our agent supports the workflow via:

compose_compliance_matrix — 9-column matrix that becomes the Pink/Red checklist.
reflect_and_critique — runs a Pink-equivalent self-check on every draft.
compose_proposal_section — produces sections that can be reviewed independently.
A color_review_state field on the conversation — tracks which color the draft is at.

The agent handles the structure. Humans hold the judgment.