Cycle Time and Throughput in the Era of AI Agents

AI coding agents have made one thing abundantly clear: generating code is no longer the bottleneck. A capable AI agent can produce a working implementation in minutes. But the code still needs to be reviewed, tested, approved, and deployed. When teams focus on generation speed while ignoring the rest of the pipeline, they end up with a paradox — more PRs open, but delivery doesn’t actually get faster.

The throughput illusion

Throughput — the number of pull requests merged per unit of time — is a tempting metric. It is easy to measure and easy to improve when AI agents are opening PRs at machine speed. A team that merged 20 PRs per week before AI adoption now merges 60. Surely that means three times the productivity?

Not necessarily. If those 60 PRs deliver the same amount of user-facing value as the original 20, you have not increased productivity. You have increased transaction volume. The AI decomposed work into smaller units, which inflates throughput without changing outcomes.

Raw throughput only matters when the unit of work is consistent. In AI-augmented teams, it rarely is. Some PRs represent substantial features; others are trivial auto-generated fixes that a human would have batched into a single commit.

Decomposing cycle time

Cycle time is a more honest metric because it spans the entire delivery pipeline, but you need to decompose it to find the real story.

Coding time — from task start to first PR — has collapsed in AI-augmented workflows. What used to take days might take hours or minutes. This is the stage where AI has the most dramatic impact.

Review time — from PR opened to first review — often increases as volume rises. Reviewers become the new bottleneck. If your coding time dropped from three days to three hours but review time grew from four hours to two days, your overall cycle time barely improved.

Approval-to-merge time captures the gap between review completion and the actual merge. In teams with CI/CD pipelines, this reflects test suite duration, merge queue wait times, and deployment pipeline health. AI-generated code does not make your test suite faster.

Deploy time — from merge to production — is typically independent of who or what wrote the code. But if AI is generating more changes faster, your deployment pipeline may be processing more frequent deploys, which can introduce its own latency if the pipeline is not scaled to match.

By breaking cycle time into these stages, you can see exactly where AI is helping and where it is creating new pressure points.

Why percentiles matter

When analyzing cycle time in AI-augmented workflows, averages are particularly misleading. If 80% of your PRs (simple, AI-generated) cycle in two hours but 20% (complex, human-driven) take five days, the average looks reasonable at roughly one day. But that average hides the fact that your most important work is stuck in a five-day pipeline.

P75 cycle time tells you what 75% of your work completes within. This is your “typical” experience and a good target for process improvement.

P95 cycle time reveals your worst-case scenarios. In hybrid teams, this often captures the complex PRs that require human judgment — architecture changes, security-sensitive code, cross-team dependencies. These are precisely the PRs that matter most and take the longest.

Tracking both percentiles alongside the median gives you a distribution view. If your P50 is shrinking (AI makes simple work faster) but your P95 is growing (complex work gets deprioritized or stuck behind review queues), you have a systemic problem that the average would never reveal.

Balancing speed and flow

The goal is not to minimize cycle time at all costs. It is to maintain healthy flow across all types of work. A few principles help:

Set work-in-progress limits that account for AI-generated volume. If your WIP limit was five PRs per developer and AI now generates ten, you need to adjust either the limit or the review process to prevent queue buildup.

Categorize work by complexity. Simple, AI-generated changes should flow through a lightweight review track. Complex changes need dedicated review time. Treating all PRs identically when their nature is fundamentally different leads to either over-processing simple work or under-reviewing complex work.

Monitor queue depth alongside cycle time. A short cycle time with a deep queue means you are moving items quickly once they start, but items are waiting too long to start. Both metrics together paint the full picture.

The metrics that matter

In the AI agent era, the metrics that matter for delivery performance are: P75 and P95 cycle time by work category, review queue depth over time, and the ratio of coding time to total cycle time. When coding time represents less than 10% of your total cycle time, optimizing code generation further is pointless — your leverage is in review, testing, and deployment.

The teams that deliver fastest will not be the ones who generate the most code. They will be the ones who keep the entire pipeline flowing.