Multi-Agent Systems Under Cost Constraints: Verification Patterns That Scale

How to verify intelligently without burning your budget

Week 2 · Published 2026-03-04 · ~13 min read

Introduction: The Reliability-Cost Tradeoff

Single-agent systems are fast and cheap. Run a query, get an answer, move on. The problem: they're unreliable. A 10% error rate might be tolerable for email classification. It's catastrophic for trade execution.

The obvious fix: verification. Run multiple agents, cross-check results, add validation layers. This works—error rates drop—but costs explode. Three agents verifying every decision means 3x API bills and 3x latency. At scale, verification becomes the bottleneck.

Most production systems operate between these extremes. You can't afford to verify everything, and you can't afford not to verify anything. The question isn't "should we verify?" but "which decisions are worth verifying, and how much should we spend doing it?"

This is where constraints become useful. Without cost pressure, you'd triple-check every output. With it, you're forced to think strategically: structured validation for routine decisions, Actor-Critic for moderate-stakes work, triangulation for high-value calls. The constraint drives the design.

This piece covers:

The goal: reliable output within cost constraints. Not perfection—intelligent risk management.

Independent Work Patterns: Why They Matter for Cost Efficiency

Verification becomes expensive when agents wait on each other. Sequential validation—agent A finishes, agent B checks, agent C double-checks—compounds latency and burns budget on idle time. Independent work patterns break this bottleneck.

Parallel Execution

Multiple agents work simultaneously. Instead of scanning 10 market sectors sequentially (50 minutes at 5 min/sector), you fan out to 10 agents (5 minutes total). You pay for 10x API calls, but you get 10x speed.

When it pays off: High-frequency decisions where time has value. Market data that's stale after 10 minutes. Customer queries where response time affects conversion. Content moderation where backlogs create risk.

Cost structure:

Same cost, 10x faster. The tradeoff: orchestration complexity (managing 10 concurrent tasks) and potential resource contention (rate limits, memory).

Task Decomposition

Break complex workflows into independent sub-tasks, each with its own verification budget. Not everything needs the same level of scrutiny.

Trading pipeline example:

  1. Scanner (light verification): Check 50 stocks for signals → structured validation only (schema, bounds)
  2. Analysis (medium verification): Deep-dive on 5 flagged stocks → Actor-Critic pattern
  3. Execution (heavy verification): Risk check before $50k trade → triangulation (3 agents must agree)

Total verification cost: $0.50 (scanner) + $5 (analysis) + $15 (execution) = $20.50

Compare to uniform verification (triangulation on everything): 50 stocks × $15 = $750. Task decomposition saves $730 by matching verification spend to decision value.

Avoiding Sequential Bottlenecks

Traditional validation creates waterfalls: agent proposes → critic validates → executor acts. If the critic is slow or the executor is blocked, everything stalls.

Async validation pattern:

  1. Agent proposes decision
  2. Execution starts (low-risk actions)
  3. Validator checks in parallel
  4. If validation fails, rollback or escalate

This works for decisions with low rollback cost. Example: Document classification can be corrected cheaply if validation catches an error after initial classification. Trade execution cannot—async validation doesn't work there.

Verification Architectures: The Spectrum

Verification isn't binary. Between "no checking" and "full triangulation" lies a spectrum of architectures, each with different cost-reliability tradeoffs.

Structured Validation (Cheapest)

Rule-based checks: schema validation, range constraints, type checking, sanity tests. No LLM calls—just code.

What it catches:

What it misses:

Cost: Negligible. Milliseconds of compute, no API calls.

ROI: Extremely high for well-defined outputs. Catches 60-80% of errors for <1% of verification cost.

Example: Financial calculation validation

def validate_portfolio(weights, prices, positions):
    assert sum(weights) == 1.0, "Weights must sum to 1.0"
    assert all(w >= 0 for w in weights), "Weights must be non-negative"
    assert all(p > 0 for p in prices), "Prices must be positive"
    assert len(weights) == len(positions), "Dimension mismatch"
    return True

Cost: ~0.1ms. Catches dimensional errors, impossible values, basic math mistakes. Doesn't catch: incorrect risk model, wrong asset correlations, flawed assumptions.

Pattern: Always start here. Structured validation is your first line of defense. It's fast, cheap, and catches the low-hanging fruit.

Actor-Critic Pattern (Medium Cost)

One agent proposes, another critiques. The proposer generates a solution; the critic evaluates it for errors, edge cases, and logical consistency.

Architecture:

Proposer (GPT-4o): "Execute buy order for AAPL, 100 shares, market order"
Critic (GPT-4o-mini): "Risk check: Position would exceed 10% portfolio allocation. 
                       Recommend reduce to 50 shares."

Cost structure:

What it catches:

ROI sweet spot: Decisions worth $100-$10k where full triangulation is overkill but single-agent is too risky.

Pattern: Use cheaper model for critic. It doesn't need to generate solutions, just spot problems. GPT-4o-mini or Claude Haiku work well for criticism.

Triangulation (Higher Cost, Higher Confidence)

Multiple agents solve the same problem independently. Compare results: agreement builds confidence, divergence signals uncertainty.

Architecture:

Agent A: Calculates portfolio VaR = $47,250
Agent B: Calculates portfolio VaR = $48,100
Agent C: Calculates portfolio VaR = $46,950

Convergence check: Max deviation = 2.4% → within tolerance, proceed
Divergence scenario: A=$47k, B=$53k, C=$48k → flag for human review

Cost structure:

What it catches:

When worth it:

Pattern: Don't triangulate everything. It's expensive. Use it for critical path decisions where error cost justifies 3x verification spend.

Cost Frameworks: When Verification Pays Off

Verification isn't free, and errors aren't free. The question is: which is more expensive?

The ROI Calculation

The formula is straightforward:

ROI = (Error_Cost × Error_Rate_Without) - (Error_Cost × Error_Rate_With) - Verification_Cost

Positive ROI: Verify. You save more than you spend.
Negative ROI: Skip verification. Accepting errors is cheaper than preventing them.

Example: Trading signal verification

Without verification:

  • Single agent: $5/decision
  • Error rate: 10%
  • Average trade value: $10,000
  • Error cost: $500 (5% slippage on wrong trades)
  • Expected error cost: 10% × $500 = $50/decision

With Actor-Critic verification:

  • Proposer + critic: $6/decision ($5 + $1)
  • Error rate: 2% (measured after 1,000 decisions)
  • Expected error cost: 2% × $500 = $10/decision
  • Net benefit: $50 - $10 - $1 = $39 saved per decision

At 100 decisions/day: $3,900/day savings from $100/day verification spend. ROI: 3,800%

Decision Value Tiers

Not all decisions deserve the same verification budget. Allocate spend based on impact.

Tier 1: Low-value decisions (< $100 impact)

Tier 2: Medium-value decisions ($100-$10k impact)

Tier 3: High-value decisions (> $10k impact)

Frequency Economics

Decision frequency affects verification affordability. High-frequency decisions need cheaper verification or costs spiral.

As frequency increases, shift verification architecture toward cheaper methods. At 1000x/day, you can't afford triangulation on everything—structured validation becomes your primary defense.

Practical Patterns & Anti-Patterns

Patterns That Work

✓ Verify critical path, trust supporting tasks
Trading pipeline: Heavy verification on execution (triangulation), light verification on data ingestion (structured). Spend budget where errors hurt most.

✓ Use cheaper models for criticism
Proposer: GPT-4o ($5). Critic: GPT-4o-mini ($0.50). The critic doesn't need full reasoning power—it's checking for obvious mistakes. Save 80% on verification cost without sacrificing much accuracy.

✓ Structured validation first
Before running expensive LLM verification, catch format errors and constraint violations with code. Structured validation catches 60-80% of errors for <1% of cost. It's your filter—only pass validated outputs to LLM critics.

✓ Escalate divergence, not every result
Triangulation: If 3 agents agree, proceed automatically. If they diverge, escalate to human. Human time is expensive—use it for uncertain cases, not routine agreement.

Anti-Patterns to Avoid

✗ Uniform verification
Treating $10 and $10,000 decisions the same. Wastes budget on over-verification (low-stakes) or under-verification (high-stakes). Tier your decisions and match verification spend to impact.

✗ Sequential verification gates
Waterfall bottlenecks: agent A proposes → agent B validates → agent C double-checks → agent D executes. Latency compounds, costs stack, throughput collapses. Parallelize where possible.

✗ Over-verification
Triple-checking email classification or document tagging. If error cost is $0.50 and verification costs $5, you're burning money. Structured validation is sufficient for low-stakes work.

✗ No measurement
Running verification without tracking error rates or costs. You can't optimize what you don't measure. Log errors, calculate costs, compute ROI. Adjust based on data, not assumptions.

Getting Started: Verification Maturity Ladder

You don't need to implement everything at once. Start simple, measure, expand based on ROI.

Stage 1: Single agent + structured validation

Stage 2: Actor-Critic on high-value decisions

Stage 3: Selective triangulation

Stage 4: Adaptive verification

Most production systems operate at Stage 2-3. Stage 4 (adaptive) is advanced—useful for high-volume systems where verification cost compounds quickly.

Conclusion: Constraints Drive Design

Verification isn't about achieving perfection. It's about intelligently reducing error rates within cost constraints.

Single agents are fast and cheap but unreliable. Full verification is reliable but slow and expensive. The art is operating in the middle: structured validation for routine work, Actor-Critic for moderate stakes, triangulation for critical decisions.

Key takeaways:

The constraint isn't the enemy. It's the forcing function that drives good design. Without cost pressure, you'd verify everything and burn budget. With it, you're forced to think: Which errors actually matter? Where should I spend verification budget? What's the cheapest way to catch 80% of mistakes?

Next steps:

  1. Measure baseline error rates. Run your single-agent system, log outputs, manually review a sample. What percentage has mistakes?
  2. Calculate error costs. What's the impact of a wrong decision? Wasted time? Financial loss? Customer churn?
  3. Design verification architecture. Map decisions to value tiers. Choose verification patterns (structured, Actor-Critic, triangulation) based on ROI.
  4. Track improvements. Measure error rates after verification. Compute ROI: error cost reduction vs verification spend.
  5. Iterate. Adjust verification levels based on data. Over-verifying low-stakes work? Scale back. Under-verifying high-stakes decisions? Add triangulation.

Production reliability comes from intelligent tradeoffs, not unlimited budget. Verification that scales is verification that pays for itself.