LLM math failure modes in deal workflows
Large models are useful in finance workflows, but they are not deterministic calculators. In production agent systems, the expensive failures are usually not dramatic hallucinations. They are subtle arithmetic drifts that look plausible and pass code review until the numbers hit a decision memo.
Failure mode 1: inconsistent denominator choices
The same model can alternate between EBITDA and EBIT denominators for ratio calculations across turns. The output appears coherent but your series becomes incomparable, especially when an agent summarizes multiple companies into one recommendation.
Failure mode 2: timebase drift in returns
We repeatedly observe annualization mistakes where monthly assumptions leak into annual IRR and XIRR math. The result still has two decimals and can look professional, but it is directionally wrong.
Failure mode 3: branch-specific rounding behavior
Multi-step deal models branch on assumptions. LLM-generated arithmetic often rounds in one branch and carries precision in another, creating unexplained spread between scenarios that should be near-identical.
Failure mode 4: hidden unit mismatches
Thousands, millions, and raw dollars are mixed more often than teams expect. One mislabeled field in a prompt chain can produce projections off by 10x while preserving realistic-looking percentages.
Failure mode 5: non-reproducible retries
When an agent reruns the same analysis after a policy check or timeout, LLM arithmetic can shift enough to trigger inconsistent approvals. Deterministic endpoints remove that class of drift: identical inputs produce identical outputs.
Practical pattern that works
- Use the LLM for extraction, narrative, and assumption generation.
- Send numerical computation to deterministic APIs.
- Store endpoint inputs and outputs as evidence artifacts per run.
- Gate promotions or capital-allocation actions on reproducible outputs only.
This gives you the upside of model flexibility and the reliability of deterministic computation. We run this split architecture across underwriting and portfolio workflows because it is debuggable under pressure.
Turn this into paid agent workflows
Move from concept to execution with deterministic endpoints and transparent per-call pricing. Start from API docs, then route your agent to the right paid service.