Why traditional metrics fall short, and how modern frameworks like DORA and SPACE can guide better outcomes
For years, engineering leaders have relied on familiar metrics to gauge developer performance: story points, bug counts, and lines of code. These measures offered a shared baseline, especially in Agile environments where estimation and output needed a common language.
But in today’s AI-assisted world, those numbers no longer tell the full story. Performance isn’t just about volume or velocity—it’s about outcomes. Did the developer deliver the expected functionality, with the right quality, on time? That’s how we compensate today, and that’s still what matters. But how we measure those things must evolve.
With tools like GitHub Copilot, Claude Code, and Cursor generating entire functions, tests, and documentation quickly, output is becoming less about what a developer types and more about what they model, validate, and evolve.
The challenge for CIOs, CTOs, and SVPs of Engineering isn’t just adopting new tools. It’s rethinking how to measure effectiveness in a world where productivity is amplified by AI and complexity often hides behind automation.
Why Traditional Metrics Break Down
The future of measurement hinges on three categories: productivity, quality, and functionality. These have always been essential to evaluating engineering work. But in the AI era, we must measure them differently. That shift doesn’t mean abandoning objectivity; it means updating our tools.
Today’s AI-assisted workflows lack mature solutions for tracking whether functionality requirements—like EPICs and user stories—have been fully met. But new approaches, like multi-domain linking (MDL), are emerging to close that gap. Measurement is getting smarter, and more connected, because it has to.
The problem isn’t that legacy metrics are useless. It’s that they’re easily gamed, misinterpreted, or disconnected from business value.
- Story points were never meant to be performance metrics. They were team-based estimates but are often misused to compare individuals.
- Lines of code favor verbosity over impact. In an AI-powered world, writing less can mean doing more.
- Velocity becomes misleading when AI inflates the number of completed tasks, even if they lack strategic value.
- Quality, or the number of bugs found during testing, can reflect poor understanding of story points and not poor coding.
At best, these metrics create noise. At worst, they drive harmful incentives, like rewarding speed over safety, or activity over alignment.
The Future of Measurement: Productivity, Quality, Functionality
The future of measurement hinges on three categories: productivity, quality, and functionality. These have always been essential to evaluating engineering work. But in the AI era, we must measure them differently. That shift doesn’t mean abandoning objectivity; it means updating our tools.
Today’s AI-assisted workflows lack mature solutions for tracking whether functionality requirements—like EPICs and user stories—have been fully met. But new approaches, like multi-domain linking (MDL), are emerging to close that gap. Measurement is getting smarter, and more connected, because it has to.
The Rise of Directional Metrics
Modern frameworks like DORA and SPACE were built to address these gaps.
DORA (DevOps Research and Assessment) focuses on:
- Deployment frequency
- Lead time for changes
- Change failure rate
- Mean time to restore service
These measure delivery health, not just effort. They’re useful for understanding how efficiently and safely value reaches users.
SPACE (developed by Microsoft Research) considers:
- Satisfaction and well-being
- Performance
- Activity
- Communication and collaboration
- Efficiency and flow
SPACE offers a more holistic view, especially in cross-functional and AI-assisted teams. It acknowledges that psychological safety, cross-team communication, and real flow states often impact long-term output more than individual commits.
AI Complicates the Picture
AI tools don’t eliminate the need for metrics; they demand smarter ones. When an LLM can write 80% of the code for a feature, how do we credit the developer? By the number of keystrokes? Or by their judgment in prompting, curating, and validating what the tool produced?
But here’s the deeper challenge: What if that feature doesn’t do what it was supposed to?
In AI-assisted workflows:
- Code volume no longer maps to effort
- Execution time can be reduced, but review and validation time increases
- Errors shift from logic bugs to alignment gaps—where delivered functionality doesn’t match requirements
Productivity isn’t just about output—it’s about fitness to purpose. Without strong traceability between code, tests, user stories, and epics, it’s easy for teams to ship fast but fall short of the business goal.
Many organizations today struggle to answer a basic question: Did this delivery actually fulfill the intended functionality?
This is where multi-domain linking (MDL) and AI-powered traceability show promise. By connecting user stories, requirements, test cases, design artifacts, and even user feedback within a unified graph, teams can use LLMs to assess whether the output truly matches the input.
And this capability unlocks more than just better alignment—it opens the door to innovation. AI-assisted development enables organizations to build more complex, interconnected, and adaptive systems than ever before. As those capabilities expand, so too must our ability to measure their economic value. What applications can we now build that we couldn’t before? And what is that worth to the business?
That’s not a theoretical exercise. It’s the next frontier in engineering measurement.
Productivity as a System, Not a Score
The best engineering organizations treat productivity like instrumentation. No single number can tell you what’s working, but the right mix of signals can guide better decisions. That system must account for both delivery efficiency and functional alignment. High velocity is meaningless if the outcome doesn’t meet the requirements it was designed to fulfill.
That means:
- Creating dashboards that show patterns, not just totals
- Blending technical metrics (DORA) with team dynamics (SPACE)
- Tracking improvements over time, not absolutes per sprint
- Using metrics as coaching tools, not judgment tools
Most importantly, it means aligning measurement to what matters: Did the product deliver value? Did it meet its intended function? Was the effort worth the outcome? Those are the questions that still define success—and the ones our measurement frameworks must help answer.
How to Start Rethinking Measurement
If your metrics haven’t evolved alongside your tooling, here’s how to get started:
- Audit your current metric stack. What are you measuring, and what are you missing?
- Align on outcomes. What does “good” look like for your business, not just your codebase?
- Pick 2–3 directional metrics from DORA or SPACE that reflect your actual goals.
- Baseline and benchmark. Don’t look for high scores; look for trends and improvement.
- Build measurement into your retros. Turn metrics into prompts for discussion, not weapons for comparison.
AI is reshaping how software gets built. That doesn’t mean productivity can’t be measured—it means it must be measured differently. The leaders who shift from tracking motion to monitoring momentum will build faster, healthier, and more resilient engineering teams.
Robots & Pencils: Measuring What Matters in an AI-Driven World
At Robots & Pencils, we believe productivity isn’t a score—it’s a system. A system that must measure not just speed, but alignment. Did the output meet the requirements? Did it fulfill the epic? Was the intended functionality delivered?
We help clients extend traditional measurement approaches to fit an AI-first world. That means combining DORA and SPACE metrics with functional traceability—linking code to requirements, outcomes to epics, and user stories to business results.
Our secure, AWS-native platforms are already instrumented for this kind of visibility. And our teams are actively designing multi-domain models that give leaders better answers to the questions they care about most.
As AI opens the door to applications we never thought were possible, our job is to help you measure what matters—including what’s newly possible. We don’t just help teams move faster. We help them build with confidence—and prove it.