Product, not PowerPoint: How to Evaluate Enterprise AI Partners 

A practical framework for enterprise AI vendor selection that prioritizes functional product. 

There is a simple truth in basketball: when someone claims they can dunk, you do not want their biography. You want to see them take off, rise above the rim, and throw it down. Until the ball goes through the hoop, everything else is just pregame chatter. 

Traditional business pitches are no different. Slide after slide explaining talent, process, and commitment to excellence. Everyone insists they are fast, strategic, and powered by artificial intelligence. It all blends together. 

And just as in basketball, none of it matters until you see the dunk. 

Why Enterprise AI Partner Evaluation Has Changed 

I have spent the last year watching something shift in how enterprise buyers evaluate technology partners. The change is not subtle. AI collapsed the timeline for what is possible. Engineers use artificial intelligence to automate repetitive tasks, reveal gaps, and support rapid iteration. User experience teams model real behavior and refine interactions in a fraction of the usual time. Designers explore and adapt visual directions quickly while matching a client’s brand and needs. At the strategy level, artificial intelligence helps teams explore concepts, identify edge cases, and clarify problems before anyone designs anything or writes code. 

Teams can now build first versions far earlier than they once could. It is now possible to walk into a meeting with something real, rather than something hypothetical. 

Traditional Evaluation Arrives Too Late  

Yet enterprise evaluation still moves as if early builds take months. Teams can create quickly, but organizations are asked to decide slowly. Forrester’s 2024 Buyers’ Journey Survey reveals the scale of this shift: 92% of B2B buyers now start with at least one vendor in mind, and 41% have already selected their preferred vendor before formal evaluation even begins. Traditional vendor selection leans on slides that outline intent, case studies that point backward, and demos that highlight features. These keep judgment at arm’s length and often arrive too late to matter. 

An early milestone changes that dynamic. A deck explains. A first version proves. 

What Functional Products Reveal About AI Vendors 

A healthcare technology company came to us through a partner referral. They needed to modernize their pharmacy network’s web presence, which included hundreds of independent pharmacy websites, each with unique branding and content, all needing migration into a modern, SEO-optimized content management system. They had already sat through multiple vendor presentations that week. Each promised speed, AI capabilities, and transformation. 

At Robots & Pencils, we stopped presenting what we could do and started showing what we already built. 

Building the Functional Product in 10 Days 

Our team had a week and a half. Our engineers used AI agents to automate content scraping and migration. Our UX team modeled user flows and tested assumptions in days instead of weeks. Our designers explored visual directions that preserved each pharmacy’s brand identity while modernizing the experience. Our strategy team identified edge cases and clarified requirements before a single line of production code was written. 

We walked into the meeting with a functional product. 

The Client Demo: Testing Real Data in Real Time 

The client entered one of their pharmacy’s existing URLs into our interface. They selected brand colors. They watched our AI agents scrape content, preserve branding structure, and generate a modern, mobile-responsive website in real time. Within minutes, they were clicking through an actual functioning site built on a production-grade CMS with an administrative backend. This was not a mockup or a demo, but a working system processing their real data. 

The entire conversation shifted. They immediately started testing edge cases. What about mobile responsiveness? We showed them the mobile view that we had already built based on pre-meeting feedback. What about the administrative interface? We walked them through the CMS backend where content could be updated. They stopped asking, “Can you do this?” and started asking “What else can we build together?” and “How quickly can we expand this?” 

After the meeting, their feedback was direct: “I appreciate the way you guys approached us. Going through the demo, it wasn’t just this nebulous idea anymore. It was impressive from a build standpoint and from an administration standpoint.” 

Why Early Functional Products Prevent Partnership Failures 

When clients see a working product, even in its earliest form, they lean forward. They explore. They ask questions. They do not want to return to a deck once they have interacted with actual software. And this is precisely why the approach works. 

Most enterprise partnerships that fail do not fail because of weak engineering or design. They fail because teams hold different pictures of the same future, and those differences stay hidden until it is too late to course correct easily. A shared early version fixes that. Everyone reacts to the same thing. Misalignments surface when stakes are low. You learn how a partner listens, how they adjust, and how you work through ambiguity together. No deck presentation can show these things. 

How Early Functional Delivery Transforms Vendor Selection 

The Baseline Iteration Before Contract Signing 

At Robots & Pencils, we think of this functional product as more than a prototype. It is the baseline iteration delivered before contract signing. It shapes how the partnership forms. The client comes into the work from the start. Their data, ideas, and context shape what gets built. 

Why This Approach Stays Selective 

Because this early delivery takes real effort and investment on our behalf, we keep the process selective. We reserve early functional product development for organizations that show clear intent and strong alignment. The early artifact becomes the first shared step forward, rather than the first sales step. 

The Lasting Impact on Partnership Formation 

When you start by delivering something meaningful, you set the tone for everything that follows. The moment that first version hits the court, the moment you see the lift, the rim, and the finish, the entire relationship changes. 

In the end, the same lesson from basketball holds true. People do not remember the talk. They remember the dunk. And we would rather spend our time building something real than explaining why we could. 

If you want to explore what it looks like to begin with real work instead of a pitch, we would love to continue the conversation. Let’s talk. 


Key Takeaways 


FAQs

How long does early functional delivery take to create? 

Early functional product delivery typically takes 5-10 days, depending on complexity and data availability. At Robots & Pencils, we focus on demonstrating how we interpret requirements, handle real constraints, and collaborate under actual conditions rather than achieving feature completeness. 

What makes this approach different from a proof of concept? 

Unlike traditional proofs of concept, our baseline iteration is built with the client’s actual data and reflects real-world constraints from day one. It demonstrates partnership dynamics and problem-solving approach, not just technical capability. 

Which types of organizations are best suited for this approach? 

Organizations that show clear intent, strong alignment on objectives, and readiness to engage collaboratively benefit most from early functional delivery. This approach works best when both parties are committed to testing the partnership through real work rather than presentations. 

Can this approach work for regulated industries like healthcare or financial services? 

Yes. We’ve successfully delivered early functional products for healthcare technology companies and financial services organizations. The approach adapts to industry-specific requirements while maintaining rapid delivery timelines. 

Robots & Pencils Opens Studio for Generative and Agentic AI in Bellevue

The Seattle-area AI Studio is live, growing, and hiring engineers and builders ready to deliver impact at velocity. 

Robots & Pencils, an applied AI engineering partner known for high-velocity delivery and measurable business outcomes, today announced the opening of its Studio for Generative and Agentic AI in Bellevue.  

Candidates seeking high-impact engineering, data, and design roles can learn more at robotsandpencils.com/careers. 

A Strategic Expansion to Meet Demand for Rapid Enterprise AI 

The Studio in downtown Bellevue is fully operational and actively building its founding team as enterprise demand accelerates for AI systems that move from experimentation to production with speed, precision, and accountability. 

The Studio expands Robots & Pencils’ AI-native delivery model and represents a significant step in the company’s U.S. growth, supported by global operations in Cleveland, Calgary, Toronto, Bogota, and Lviv. It adds meaningful capacity to support organizations launching AI-enabled products, platforms, and agentic systems at scale. 

Strong Leadership Driving Focus and Velocity 

The Studio in Bellevue operates under the leadership of Jeff Kirk, Executive Vice President of Applied AI at Robots & Pencils, and reinforces the company’s growing presence in the Pacific Northwest while serving global clients pursuing ambitious AI initiatives. 

“This Studio is designed for builders who want real ownership and real impact,” said Kirk. “We are bringing together experienced teams who move quickly, think clearly, and take responsibility for outcomes. Our Studio model gives people the trust and focus to make strong decisions and deliver AI systems that translate directly into business value.” 

Working with AWS to Accelerate Enterprise AI Delivery 

As an Amazon Web Services Partner located near Amazon headquarters, the Studio in Bellevue supports clients building and scaling AI solutions on Amazon Bedrock, Amazon SageMaker, Amazon Bedrock AgentCore, Amazon Quick Suite, and related AWS services. This proximity strengthens collaboration and supports faster experimentation and production-ready delivery for complex enterprise environments. 

Robots & Pencils was recently selected as one of 11 inaugural partners in the invite-only AWS Pattern Partners program. The program works with a select group of consulting partners to define how enterprises adopt next-generation AI and emerging technologies on AWS through validated, repeatable patterns. 

This recognition acknowledges Robots & Pencils’ experience delivering production-grade AI architectures for enterprise customers. Working with AWS, the company supports secure and scalable AI delivery across regulated and high-impact industries while enabling teams to move with clarity and confidence from design through deployment. 

A Destination for Elite AI Builders 

The Studio for Generative and Agentic AI reflects Robots & Pencils’ long-standing commitment to talent density and engineering craft. Employees average fifteen years of experience and contribute patents, published research, and category-defining products across industries. The Studio in Bellevue offers engineers, applied AI specialists, product leaders, and user experience innovators the opportunity to shape a new hub while influencing high-stakes client work from the ground up. 

“To support our substantial client demand, we need incredible GenAI talent and are significantly investing in how we work with AWS. Our Bellevue AI Studio places our teams in close proximity to AWS, creating an environment that supports knowledge sharing and enables us to tap into the Seattle-area hot bed of incredible, wicked-smart talent,” said Len Pagon Jr., CEO of Robots & Pencils. “The Bellevue location expands our ability to deliver applied AI outcomes at scale while creating an environment where experienced builders can do the most meaningful work of their careers. This expansion reflects confidence in our teams and the direction we are taking the company.” 

Velocity Pods Deliver AI Products in Weeks 

Teams in the Studio operate in industry-focused Velocity Pods supporting Education, Energy, Financial Services, Healthcare, Manufacturing, Transportation, and Retail and CPG. These pods launch AI generative and agentic products to market in 30-to-45-day cycles while addressing complex modernization and intelligent automation programs across the enterprise. 

Now Hiring for AI Engineering Jobs in Bellevue 

Robots & Pencils is actively staffing the Studio for Generative and Agentic AI in Bellevue and invites experienced engineers and builders to apply. Open roles span engineering, applied AI, product, and design. 

Interested candidates can explore opportunities and submit applications at robotsandpencils.com/careers. 

The Studio in Bellevue opens with momentum, leadership, and a clear mandate to build AI solutions that matter.  

The pace of AI change can feel relentless with tools, processes, and practices evolving almost weekly. We help organizations navigate this landscape with clarity, balancing experimentation with governance, and turning AI’s potential into practical, measurable outcomes. If you’re looking to explore how AI can work inside your organization—not just in theory, but in practice—we’d love to be a partner in that journey. Request an AI briefing

Build vs. Buy for Conversational AI Agents: Why the Future Belongs to Builders 

You can feel the shift the moment you try to deploy a conversational AI agent through an off-the-shelf platform. The experience looks clean and efficient on the surface, yet it rarely creates the natural, personal, assistive interactions customers expect. It routes and deflects with precision, although the user often leaves without real progress. For teams focused on modern customer experience, that gap becomes impossible to ignore. 

Most “buy” options in conversational AI grew out of call center design. Their core purpose supports internal efficiency rather than meaningful customer support. 

The Tools on the Market Prioritize Operations Over Experience 

Commercial conversational AI platforms concentrate on routing, handle time, and contact center workflows. Their architecture directs intelligence toward internal productivity. Customers receive an experience shaped by legacy operational goals, which leads to uniform patterns across organizations. 

Many buyers assume these tools match customer needs. Simple data points help reset that assumption.   

A more experience-centric path creates a very different outcome. Picture a manufacturing technician on a production line who notices a calibration issue on a piece of equipment. A contact-center-oriented system assists the internal support team by surfacing documentation, troubleshooting steps, and recommended scripts. The support team responds quickly, although the technician still waits for guidance during a critical moment on the floor. 

Whereas a true customer-facing agent engages directly with the technician. It reviews the equipment profile, interprets sensor readings, outlines safe adjustment steps, and highlights the specific parameters that require attention. The technician gains clarity during the moment of need. Production continues with confidence and momentum. 

This direct guidance transforms the experience. The agent participates in the workflow as a real-time partner rather than a relay for internal teams. 

Your Conversational Data Creates the Moat 

Every customer question reflects a need. Every phrasing choice, pause, and follow-up captures intent. These patterns form the foundation of a truly assistive conversational AI system. They reveal friction, opportunity, and the natural language of your specific users. 

SaaS solutions provide insights from these interactions, while the deeper value accumulates inside the vendor’s system. Their product evolves with your customer patterns, while your experience evolves at a slower pace. 

Modern AI creates advantage through data, not through foundational models. Conversation data reinforces your knowledge of customers and shapes your ability to improve rapidly. Ownership of that data creates the moat that strengthens with every interaction. 

Customization Creates the Quality Customers Feel 

The visible layer of an AI agent, including the interface, avatar, or voice, offers the simplest design challenge. Real quality lives underneath. Tone calibration, workflow logic, domain vocabulary, and retrieval strategy shape the accuracy and trustworthiness of every response. 

Generic templates often reach steady performance at a moderate level of accuracy. The shift into high-trust reliability grows from tuning against your specific customer language and your operational context. SaaS platforms hold the data, although they do not hold the lived knowledge required to interpret which interactions reflect success, friction, or emerging need. Your teams understand the nuance, which creates a tuning loop that only internal ownership can support. 

A system that learns within the grain of your business always outperforms a template that treats your conversations as generic. 

Building Thrives Through Modern Ecosystems 

Building once required full-stack engineering and long timelines. Today, teams assemble ecosystems that include hosted models, vector databases, retrieval frameworks, and orchestration layers. This approach delivers speed and preserves data governance.  

 Many buyers assume building is slow. New modular tools make the opposite true.  

Advantage grows from how your system comes together around your data. Lightweight architectures adapt quickly and evolve in rhythm with your customers. 

The Strategic Equation Favors Builders 

AI-native experience design has reshaped the traditional build vs. buy decision. Modern tooling accelerates internal development, and internal data governance strengthens safety. A build path creates forward momentum without relying on vendor roadmaps. 

Differentiation comes from experience quality. Off-the-shelf bots produce uniform interactions across brands. Custom agents express your language, workflows, and service model. 

Data stewardship defines long-term success in conversational AI. Ownership of the learning loop positions teams to adapt quickly, evolve responsibly, and compound knowledge over time. 

The Organizations That Win Will Be the Ones That Learn Fastest 

In the next wave of digital experience, leaders rise through insight and adaptability. Their advantage reflects what they learn from every conversation, how quickly they apply that learning, and how deeply their AI mirrors the needs of their customers. 

Buying provides a tool. Building creates a learning system. And learning carries the greatest compounding force in customer experience. 

The pace of AI change can feel relentless with tools, processes, and practices evolving almost weekly. We help organizations navigate this landscape with clarity, balancing experimentation with governance, and turning AI’s potential into practical, measurable outcomes. If you’re looking to explore how AI can work inside your organization—not just in theory, but in practice—we’d love to be a partner in that journey. Request an AI briefing. 



Key Takeaways 


FAQs 

What creates value in a conversational AI agent? 

Value grows from the quality of the interaction. Conversational AI agents reach their potential when they draw from real customer language, understand business context, and evolve through continuous learning. Ownership of conversation data strengthens this process and elevates the customer experience. 

Why do organizations choose to build conversational AI? 

Organizations choose a build strategy to shape every element of the experience. Internal development allows teams to guide tone, safety, workflow logic, and response quality. This alignment creates reliable, natural, and assistive interactions that match customer expectations. 

How does conversation data strengthen an AI agent? 

Every user question reveals intention, preference, and behavior. These signals guide tuning, improve routing, and highlight gaps in knowledge sources. Data ownership empowers organizations to refine the agent with precision and create rapid compound learning. 

How do modern AI tools support faster internal development? 

Hosted large language models, retrieval infrastructures, vector databases, and orchestration frameworks provide ready-to-use building blocks. Teams assemble these components into a modular system designed around their data and their customer experience goals. 

What advantages emerge when teams customize their AI agents? 

Customization aligns the agent with domain language, operational processes, and brand voice. This alignment raises accuracy, builds trust, and creates a conversational experience that feels tailored and assistive. 

How does a build approach create long-term strategic strength? 

A build approach cultivates an internal learning engine. Every conversation sharpens the agent, strengthens customer relationships, and expands organizational knowledge. This compounding effect creates durable advantage in digital experience. 

Accelerating Innovation with AWS: Robots & Pencils Selected as an AWS Pattern Partner 

Today, Robots & Pencils joins AWS as a launch partner in the AWS Pattern Partners program, an invite-only initiative that works with a select cohort of consulting partners to define how enterprises adopt next generation AI and emerging technologies on AWS. 

As a Pattern Partner, Robots & Pencils brings proven success with emerging technologies on AWS, including AI/ML, Generative/Agentic AI, Robotics, Space Technology, and Quantum. The program focuses on accelerating enterprise adoption through repeatable, scalable patterns that encode tested ways to solve specific business problems, with architecture, controls, and delivery practices that have already been validated with customers. 

For customers, selection of Robots & Pencils into this program signals that AWS has reviewed and endorsed both the outcomes and the operating model behind the work delivered in these domains. Enterprises that face pressure to modernize critical processes, adopt AI safely, and respond to new regulatory and security requirements gain access to patterns that have already delivered measurable results. 

Pattern Partners also sets a clear horizon view for emerging technology. In the near term, it concentrates on AI/ML, Generative & Agentic AI patterns, including sub domains such as Process to Agent (P2A), Agent to Agent (A2A), Responsible AI, and RegAI. Over the midterm, the program extends these capabilities into connected environments that use Robotics, IoT, and Edge and Space Technology on AWS. For the long term, it explores Quantum and next generation enterprise innovations, aligning new capabilities with existing AWS investments in data, AI, and security as they mature into reliable patterns. 

Our Pattern: Enterprise Document Intelligence Platform 

At the heart of the participation of Robots & Pencils in Pattern Partners is a flagship pattern that the company is co-developing and scaling with AWS. 

The Customer Problem 

Organizations in Energy, Manufacturing, and Health & Wellness face a common set of challenges. Data and workflows sit in disconnected systems, which slows AI adoption and creates duplicated effort. Teams find it difficult to govern AI models and agents at enterprise scale, especially when regulations and internal standards move quickly. Talent and process gaps make it hard to adopt new technology in a way that satisfies risk, compliance, and operational leaders. 

Our Joint Approach with AWS 

Together with AWS, Robots & Pencils has designed the Enterprise Document Intelligence Platform. This pattern combines an architecture built natively on AWS using Amazon Bedrock, Amazon SageMaker, and Amazon Bedrock AgentCore, an operating model with clear roles, runbooks and guardrails for IT, data, security and business teams, and accelerators such as pre-built integrations, automations, policies, templates, dashboards and agents. This pattern is being refined through a time boxed incubation with a set of lighthouse customers. As it matures, it is packaged as a Pattern Package so that more joint customers can adopt it rapidly with consistent results. 

Early Results 

Early adopters are already reporting tangible outcomes from the Robots & Pencils’ Enterprise Document Intelligence Platform. With 2 million interactions across 100,000+ users, customers reported a 90% satisfaction score and 40% improved confidence in responses from the pattern. 

As these results are validated across additional lighthouse customers, the Pattern Package becomes available to AWS field teams globally. This enables customers in new regions and sectors to benefit from the same proven approach without restarting design from the beginning. 

How the Pattern Partners Program Works with Customers 

When a customer engages Robots & Pencils through the Pattern Partners program, the engagement starts from a proven blueprint, not from scratch. The Pattern Package already encodes successful implementations, including architectures, guardrails, and playbooks. Customers receive coordinated support from AWS specialists, the AWS Consulting COE Pattern Partner team and experts from Robots & Pencils across consulting, engineering, and product. 

The program design supports fast yet responsible experimentation. Customers can move from idea to live pilot while maintaining enterprise grade security, compliance and governance. The pattern also includes a clear path from pilot to scale, so organizations can extend from initial deployments to cross region and multi business unit rollouts with ongoing optimization. 

Being part of the AWS Pattern Partners program allows Robots & Pencils to bring emerging AWS capabilities such as Generative AI and Agentic applications to customers earlier. Guardrails and controls stay clear and well defined. The company can turn its strongest customer successes into repeatable assets that benefit a wider set of organizations. Collaboration with AWS field teams, solution architects and service teams keeps the pattern aligned with the latest platform innovation. Robots & Pencils also contributes back to the broader AWS partner ecosystem by sharing learnings and raising the standard for how emerging technology is adopted. For customers, this approach reduces risk, increases predictability, and accelerates business impact from AWS investments. 

Partner Perspective 

“Joining AWS Pattern Partners is a strategic milestone for Robots & Pencils,” said Jeff Kirk, Executive Vice President of Applied AI, Robots & Pencils. “With our Enterprise Document Intelligence Platform, we turn our strongest customer wins into a clear, repeatable path to reduce onboarding time for customers in need of intelligent search, and increased confidence in the accuracy of the results, so customers can move from pilots to production with greater speed, control and confidence.”  

AWS Perspective 

“AWS created Pattern Partners to work with a select cohort of builders who can set the standard for how enterprises adopt emerging technology on AWS. Robots & Pencils brings deep expertise in KnowledgeOps, including RAG and compound systems, and a proven pattern in the Enterprise Document Intelligence Platform that is already delivering measurable outcomes for customers,” said Brian Bohan, Managing Director of Consulting COE, AWS. “We look forward to scaling this work together and bringing these benefits to more joint customers across industries.”  

Next Steps 

Customers interested in these patterns can speak with Robots & Pencils through Robotsandpencils.com/contact to review current challenges and identify which patterns are most relevant. 

Those that want to explore Enterprise Document Intelligence Platform in depth or learn how the AWS Pattern Partners program could support their own roadmap can request a focused discovery session. In that conversation, AWS and Robots & Pencils work with stakeholders to map business challenges to the pattern, estimate potential impact, and define a practical path to adoption. 

Together, AWS and Robots & Pencils look forward to turning critical business challenges into repeatable, scalable patterns for growth. 

The pace of AI change can feel relentless with tools, processes, and practices evolving almost weekly. We help organizations navigate this landscape with clarity, balancing experimentation with governance, and turning AI’s potential into practical, measurable outcomes. If you’re looking to explore how AI can work inside your organization—not just in theory, but in practice—we’d love to be a partner in that journey. Request an AI briefing. 



FAQs

What is the AWS Pattern Partners program?

It is an invite-only AWS initiative that works with a select group of consulting partners to define how enterprises adopt next generation AI and emerging technologies through validated, repeatable patterns.

Why was Robots & Pencils selected as a Pattern Partner?

AWS recognized the company’s proven outcomes across AI and emerging technologies, as well as its track record delivering measurable results with scalable architectures and operating models.

What is the Enterprise Document Intelligence Platform?

It is a jointly designed pattern that uses AWS native services and accelerators to help organizations unify data, streamline governance, and deploy Generative and Agentic AI across complex environments.

Which AWS technologies power the pattern?

Key services include Amazon Bedrock, Amazon SageMaker, and Amazon Bedrock AgentCore, along with AWS controls, security practices, and operational frameworks.

Who benefits most from this pattern?

Enterprises in sectors like Energy, Manufacturing, and Health and Wellness that face challenges with disconnected data, evolving regulations, and the need for responsible AI adoption at scale.

What results have early adopters seen?

Customers reported 2 million interactions across more than 100,000 users, a 90 percent satisfaction score, and a 40 percent improvement in confidence in response accuracy.

How does the program support faster innovation?

Organizations begin with a proven blueprint rather than a blank page. This accelerates pilots while maintaining enterprise grade governance and provides a clear pathway to large scale deployment.

How do customers engage?

Teams can connect through Robotsandpencils.com/contact to discuss current challenges or request a focused discovery session to understand fit, impact potential, and next steps.

What does this mean for long term innovation?

The program continually extends into new domains, guiding enterprises through emerging capabilities such as Robotics, IoT, Space Technology, and Quantum as they mature into reliable patterns.

Robots & Pencils Plans Seattle-area Expansion with Studio for Generative & Agentic AI 

The Bellevue, Washington investment opens pathways for forward deployed engineers and builders seeking career-defining work in applied AI. 

Robots & Pencils, an applied AI engineering partner known for high velocity delivery and measurable business outcomes, today announced plans to open a Seattle-area Studio for Generative & Agentic AI office in downtown Bellevue in early January 2026. The expansion fuels the next phase of growth for the company’s AI-native Studio and strengthens North American delivery, as demand for AI-enabled product engineering accelerates across the United States. As an Amazon Web Services (AWS) Partner, the Bellevue location, with its proximity to Amazon headquarters, is a natural site to accelerate client AI solutions on Amazon Bedrock, Amazon SageMaker, Amazon Bedrock AgentCore, and more. 

Candidates seeking high-impact engineering roles can learn more at robotsandpencils.com/careers. 

The new Studio reflects a growing U.S. footprint supported by existing global operations in Cleveland, Calgary, Toronto, Bogotá, and Lviv. The Studio organizes cross-functional product, engineering, data, and design talent into vertical industry-focused pods that support sectors such as Education, Energy, Financial Services, Healthcare, Manufacturing, Transportation, and Retail/CPG. The presence in the Seattle area adds meaningful engineering capacity and enhances support for clients pursuing ambitious AI programs and large-scale modernization work. 

“The investment in Bellevue and access to deep talent in the Pacific Northwest gives our teams and our clients a powerful new chapter,” said Len Pagon Jr., CEO of Robots & Pencils. “The engineering expertise in this region aligns perfectly with our Studio strategy. We see tremendous opportunities to grow our talent base, strengthen delivery, and help organizations reach AI outcomes that advance their businesses. Our teams are energized by this expansion and ready for the momentum ahead.” 

Jeff Kirk, Executive Vice President of Applied AI at Robots & Pencils, will lead the Bellevue studio. “The Studio in Bellevue is a pivotal investment in our client and talent strategy,” said Kirk. “Engineers and builders in this region bring the experience and ambition that shape industry-defining solutions. Speed matters, and our Studio structure is designed for launching AI products to market every 30 to 45 days. The Seattle-area strengthens the engineering capacity required to deliver that velocity at scale. We look forward to building a team that thrives on complex challenges and produces work that matters.” 

Robots & Pencils continues to invest in environments where elite talent can perform at the highest level. The company is known for its talent density, with teams averaging fifteen years of experience and contributing patents, published research, and category-shaping products across industries. The Studio creates space for engineers, applied AI specialists, product leaders, and user experience innovators to influence major client engagements and shape a new hub from the ground up. It anchors work in AI systems, agents and agentic workflows, digital modernization, intelligent automation, and data-driven product innovation. 

Interested applicants can explore open roles at robotsandpencils.com/careers. The Studio is ready for builders who want to shape the next era of AI solutions with momentum and purpose. 

The pace of AI change can feel relentless with tools, processes, and practices evolving almost weekly. We help organizations navigate this landscape with clarity, balancing experimentation with governance, and turning AI’s potential into practical, measurable outcomes. If you’re looking to explore how AI can work inside your organization—not just in theory, but in practice—we’d love to be a partner in that journey. Request an AI briefing

The Agentic Trap: Why 40% of AI Automation Projects Lose Momentum

Gartner’s latest forecast is striking: more than 40% of agentic AI projects will be canceled by 2027. At first glance, this looks like a technology growing faster than it can mature. But a closer look across the industry shows a different pattern. Many initiatives stall for the same reason micromanaged teams do. The work is described at the level of steps rather than outcomes. When expectations aren’t clear, people wait for instructions. When expectations aren’t clear for agents, they either improvise poorly or fail to act. 

This is the same shift I described in my previous article, “Software’s Biggest Breakthrough Was Making It Cheap Enough to Waste.” When software becomes inexpensive enough to test freely, the organizations that pull ahead are the ones that work toward clear outcomes and validate their decisions quickly. 

Agentic AI is the next stage of that evolution. Autonomy becomes meaningful only when the organization already understands the outcome it’s trying to achieve, how good decisions support that outcome, and when judgment should shift back to a human. 

The Shift to Outcome-Oriented Programming 

Agentic AI brings a model that feels intuitive but represents a quiet transformation. Traditional automation has always been procedural in that teams document the steps, configure the workflow, and optimize the sequence. Like a highly scripted form of people management, this model is effective when the work is predictable, but limited when decisions are open-ended or require problem solving. 

Agentic systems operate more like empowered teams. They begin with a desired outcome and use planning, reasoning, and available tools to move toward it. As system designers, our role shifts from specifying every step to defining the outcome, the boundaries, and the signals that guide good judgment. 

Instead of detailing each action, teams clarify: 

This shift places new demands on organizational clarity. To support outcome-oriented systems, teams need a shared understanding of how decisions are made. They need to determine what good judgment looks like, what tradeoffs are acceptable, and how to recognize situations that require human involvement. 

Industry research points to the same conclusion. Harvard Business Review notes that teams struggle when they choose agentic use cases without first defining how those decisions should be evaluated. XMPRO shows that many failures stem from treating agentic systems as extensions of existing automation rather than as tools that require a different architectural foundation. RAND’s analysis adds that projects built on assumptions instead of validated decision patterns rarely make it into stable production. 

Together, these findings underscore a simple theme. Agents thrive when the organization already understands how good decisions are made. 

Decision Intelligence Shapes Agentic Performance  

Agentic systems perform well when the outcome is clear, the signals are reliable, and proper judgment is well understood. When goals or success criteria are fuzzy, or tasks overly complex, performance mirrors that ambiguity. 

In a Carnegie Mellon evaluation, advanced models completed merely one-third of multi-step tasks without intervention. Meanwhile, First Page Sage’s 2025 survey showed much higher completion rates in more structured domains, with performance dropping as tasks became more ambiguous or context heavy. 

This reflects another truth about autonomy. Some problems are simply too broad or too abstract for an agent to manage directly. In such cases, the outcome must be broken into sub-outcomes, and those into smaller decisions, until the individual pieces fall within the system’s ability to reason effectively. 

In many ways, this mirrors effective leadership. Good leaders don’t hand individual team members a giant, unstructured mandate. They cascade outcomes into stratified responsibilities that people can act on. Agentic systems operate the same way. They thrive when the goal has been decomposed into solvable parts with well-defined judgment and guardrails. 

This is why organizational clarity becomes a core predictor of success. 

How Teams Fall Into the Agentic Trap 

Many organizations feel the pull of agentic AI because it promises systems that plan, act, and adapt without waiting for human intervention. But the projects that stall often fall into a predictable trap. 

Teams begin by automating process instead of automating the judgment behind the decisions the agent is expected to make. Teams define what a system should do instead of defining how to evaluate the output or what “good” should look like. Vague quality metrics, progress signals, and escalation criteria lead to technically valid, strategically mediocre decisions that erode confidence in the system. 

The research behind this pattern is remarkably consistent. HBR notes that teams often choose agentic use cases before they understand the criteria needed to evaluate them. XMPRO describes the architectural breakdowns that occur when agentic systems are treated like upgrades to procedural automation. RAND’s analysis shows that assumption-driven decision-making is one of the strongest predictors of AI project failure, while projects built on clear evaluation criteria and validated decision patterns are far more likely to reach stable production. 

This is the agentic trap: trying to automate judgment without first understanding how good judgment is made. Agentic AI is more than automation of steps, it’s the automation of evaluation, prioritization, and tradeoff decisions. Without clear outcomes, criteria, signals, and boundaries to inform decision-making, the system has nothing stable to scale, and its behavior reflects that uncertainty. 

A Practical Way Forward: The Automation Readiness Assessment 
Decisions that succeed under autonomy share five characteristics. When one or more are missing, agents need more support: 

Have all five? Build with confidence. 
Only three or four? Pilot with human review in order to build up a live data set. 
Only one or two? Go strengthen your decision clarity before automating. 

This approach keeps teams grounded. It turns autonomy from an aspirational leap into a disciplined extension of what already works. 

The Path to Agentic Maturity 

Agentic AI expands an organization’s capacity for coordinated action, but only when the decisions behind the work are already well understood. The projects that avoid the 40% failure curve do so because they encode judgement into agents, not just process. They clarify the outcome, validate the decision pattern, define the boundaries, and then let the system scale what works. 

Clarity of judgment produces resilience, resilience enables autonomy, and autonomy creates leverage. The path to agentic maturity begins with well-defined decisions. Everything else grows from there. 

The pace of AI change can feel relentless with tools, processes, and practices evolving almost weekly. We help organizations navigate this landscape with clarity, balancing experimentation with governance, and turning AI’s potential into practical, measurable outcomes. If you’re looking to explore how AI can work inside your organization—not just in theory, but in practice—we’d love to be a partner in that journey. Request an AI briefing. 


Key Takeaways 


FAQs 

What is the “agentic trap”? 
The agentic trap describes what happens when organizations rush to deploy agents that plan and act, before they have defined the outcomes, decision criteria, and guardrails those agents require. The technology looks powerful, yet projects stall because the underlying decisions were never made explicit. 

How is agentic AI different from traditional automation? 
Traditional automation follows a procedural model. Teams document a sequence of steps and the system executes those steps in predictable conditions. Agentic AI starts from an outcome, uses planning and reasoning to choose actions, and navigates toward that outcome using tools, data, and judgment signals. The organization moves from “here are the steps” to “here is the result, the boundaries, and the signals that matter.” 

Why do so many agentic AI projects lose momentum? 
Momentum fades when teams try to automate decisions that have not been documented, validated, or measured. Costs rise, risk concerns surface, and it becomes harder to show progress against business outcomes. Research from Gartner, Harvard Business Review, XMPRO, and RAND all point to the same pattern: projects thrive when the decision environment is explicit and validated, and they struggle when it is based on assumptions. 

What makes a decision “ready” for autonomy? 
Decisions are ready for agentic automation when they meet five criteria: 

The more of these elements are present, the more confidently teams can extend autonomy. 

How can we use the Automation Readiness Assessment in practice? 
Use the five criteria as a simple scoring lens for each candidate decision: 

This keeps investment aligned with decision maturity and creates a clear path from experimentation to durable production. 

Where should leaders focus first to reach agentic maturity? 
Leaders gain the most leverage by focusing on judgment clarity within critical workflows. That means aligning on desired outcomes, success metrics, escalation thresholds, and the signals that inform good decisions. With that foundation, agentic AI becomes a force multiplier for well-understood work rather than a risky experiment in ambiguous territory. 

Software’s Biggest Breakthrough Was Making It Cheap Enough to Waste 

AI and automation are making development quick and affordable. Now, the future belongs to teams that learn as fast as they build. 

Building software takes patience and persistence. Projects run long, budgets stretch thin, and crossing the finish line often feels like survival. If we launch something that works, we call it a win. 

That rhythm has defined the industry for decades. But now, the tempo is changing. Kevin Kelly, the founding executive editor of Wired Magazine, once said, “Great technological innovations happen when something that used to be expensive becomes cheap enough to waste.” 

AI-assisted coding and automation are eliminating the bottlenecks of software development.  What once took months or years can now be delivered in days or weeks. Building is no longer the hard part. It’s faster, cheaper, and more accessible than ever.  

Now, as more organizations can build at scale, custom software becomes easier to replicate, and its ROI as a competitive advantage grows less predictable. As product differentiation becomes more difficult to maintain, a new source of value emerges: applied learning, how effectively teams can build, test, adapt, and prove what works. 

This new ROI is not predicted. It depends on the ability to:  

The organizations that succeed will learn faster from what they build and build faster from what they learn. 

From Features to Outcomes, Speculation to Evidence 

Agile transformed how teams build software. It replaced long project plans with rapid sprints, continuous delivery, and an obsession with velocity. For years, we measured progress by how many features we shipped and how fast we shipped them. 

But shipping features doesn’t equal creating value. A feature only matters if it changes behavior or improves an outcome, and many don’t. As building gets easier, the hard part shifts to understanding which ideas truly create impact and why. 

AI-assisted and automated development now make that learning practical. Teams can generate several variations of an idea, test them quickly, and keep only what works best. The work of software development starts to look more like controlled experimentation. 

This changes how we measure success. The old ROI models relied on speculative forecasts and business cases built on assumptions about value, timelines, and adoption. We planned, built, and launched, but when the product finally reached users, both the market and the problem had already evolved. 

Now, ROI becomes something we earn through proof. We begin with a measurable hypothesis and build just enough to test it:  

If onboarding time falls by 30 percent, retention will rise by 10 percent,  
creating two million dollars in annual value.  

Each iteration provides evidence. Every proof point increases confidence and directs the next investment. In this way, value creation and validation merge, and the more effectively we learn, the faster our return compounds. 

ROI That Compounds 

ROI used to appear only after launch, when the project was declared “done.” It was calculated as an academic validation of past assumptions and decisions. The investment itself remained a sunk cost, viewed as money spent months ago. 

In an outcome-driven model, value begins earlier and grows with every iteration. Each experiment creates two returns: the immediate impact of what works and the insight gained from what doesn’t. Both make the next round more effective. 

Say you launched a small pilot with ten users. Within weeks, they’re saving time, finding shortcuts, and surfacing friction you couldn’t predict on paper. That feedback shapes the next version and builds the confidence to expand to a hundred users. Now, you can measure quantitative impact, like faster response times, fewer manual steps, and higher satisfaction. Pay off rapidly scales, as the value curve steepens with each round of improvement. 

Moreover, you are collecting measurement on return continuously, using each cycle’s results as evidence to justify the next. In this way, return becomes the trigger for further investment, and the faster the team learns, the faster the return accelerates. 

Each step also leaves behind a growing library of reusable assets: validated designs, cleaner data, modular components, and refined decision logic. Together, these assets make the organization smarter and more efficient with each cycle. 

When learning and value grow together, ROI becomes a flywheel. Each iteration delivers a product that’s smarter, a team that’s sharper, and an organization more confident in where to invest next. To harness that momentum, we need reliable ways to measure progress and prove that value is growing with every step. 

Measuring Progress in an Outcome-Driven Model 

When ROI shifts from prediction to evidence, the way we measure progress has to change. Traditional business cases rely on financial projections meant to prove that an investment would pay off. In an outcome-driven model, those forecasts give way to leading indicators collected in real-time.  

Instead of measuring progress by deliverables and deadlines, we use signals that show we’re moving in the right direction. Each iteration increases confidence that we are solving the right problem, delivering the right outcome, and generating measurable value. 

That evidence evolves naturally with the product’s maturity. Early on, we look for behavioral signals, or proof that users see the problem and are willing to change. As traction builds, we measure whether those new behaviors produce the desired outcomes. Once adoption scales, we track how effectively the system converts those outcomes into sustained business value. 

You can think of it as a chain of evidence that progresses from leading to lagging indicators: 

Behavioral Change → Outcome Effect → Monetary Impact 

The challenge, then, is to create a methodology that exposes these signals quickly and enables teams to move through this progression with confidence, learning as they go. This process conceptually follows agile, but changes as the product evolves through four stages of maturity: 

Explore & Prototype → Pilot & Validate → Scale & Optimize → Operate & Monitor 

At each stage, teams iteratively build, test, and learn, advancing only when success is proven. What gets built, how it’s measured, and what “success” means evolve as the product matures. Early stages emphasize exploration and learning; later stages focus on optimizing outcomes and capturing value. Each transition strengthens both evidence that the product works and confidence in where to invest next. 

1. Explore & Prototype:  

In the earliest stage, the goal is to prove potential. Teams explore the problem space, test assumptions, and build quick prototypes to expose what’s worth solving. The success measures are behavioral: evidence of user willingness and intent. Do users engage with early concepts, sign up for pilots, or express frustration with the current process? These signals de-risk demand and validate that the problem matters. 

The product moves to the next stage only with a clear, quantified problem statement supported by credible behavioral evidence. When users demonstrate they’re ready for change, the concept is ready for validation. 

2. Pilot & Validate:  

Here’s where a prototype turns into a pilot to test whether the proposed solution actually works. Real users perform real tasks in limited settings. The indicators are outcome-based. Can people complete tasks faster, make fewer errors, or reach better results? Each of these metrics ties directly to the intended outcome that the product aims to achieve. 

To advance from this stage, the pilot must show measurable progress towards the outcome. When that evidence appears, it’s time to expand. 

3. Scale & Optimize:  

As adoption grows, the focus shifts from proving the concept to demonstrating outcomes and refining performance. Every new user interaction generates evidence that helps teams understand how the product creates impact and where it can improve. 

Learning opportunities emerge from volume. Broader usage reveals edge cases, hidden friction points, and variations that allow teams to refine the experience, calibrate models, automate repetitive tasks, and strengthen outcome efficacy. 

At this stage, value indicators connect usage to business KPIs like faster response times, higher throughput, improved satisfaction, and lower support costs. This is where value capture compounds. As more users adopt the product, the value they generate accumulates, proving that the system delivers significant business impact. 

The product reaches the next level of maturity when it shows sustained reliable impact to outcome measures across wide-spread usage. 

4. Operate & Monitor:  

In the final stage, the emphasis shifts from optimization to observation. The system is stable, but the environment and user needs continue to evolve and erode effectiveness over time. The goal is twofold: ensure that value continues to be realized and detect the earliest signals of change. 

The indicators now focus on sustained ROI and performance integrity. Teams track metrics that show ongoing return (cost savings, revenue contribution, efficiency gains) while monitoring usage patterns, engagement levels, and model accuracy. 

When anomalies appear (drift in outcomes, declining engagement, or new behaviors), they become the warning signs of changing user needs. Each anomaly hints at a new opportunity and loops the team back into exploration. This begins the next cycle of innovation and validation. 

From Lifecycle to Flywheel: How ROI Becomes Continuous 

Across these stages, ROI becomes a continuous cycle of evidence that matures alongside the product itself. Each phase builds on the one before it.  

Together, these stages form a closed feedback loop—or flywheel—where evidence guides investment. Every dollar spent produces both impact and insight, and those insights direct the next wave of value creation. The ROI conversation shifts from “Do you believe it will pay off?” to “What proof have we gathered, and what will we test next?” 

From ROI to Investment Upon Return 

AI and automation have made building easier than ever before. The effort that once defined software development is no longer the bottleneck. What matters now is how quickly we can learn, adapt, and prove that what we build truly works. 

In this new environment, ROI becomes a feedback mechanism. Returns are created early, validated often, and reinvested continuously. Each cycle of discovery, testing, and improvement compounds both value and understanding, and creates a lasting continuous advantage. 

This requires a mindset shift as much as a process shift. From funding projects based on speculative confidence in a solutionto funding them based on their ability to generate proof. When return on investment becomes investment upon return, the economics of software change completely. Value and insight grow together. Risk declines with every iteration. 

When building becomes easy. Learning fast creates the competitive advantage. 

The pace of AI change can feel relentless with tools, processes, and practices evolving almost weekly. We help organizations navigate this landscape with clarity, balancing experimentation with governance, and turning AI’s potential into practical, measurable outcomes. If you’re looking to explore how AI can work inside your organization—not just in theory, but in practice—we’d love to be a partner in that journey. Request an AI briefing. 


The New Equations 


Key Takeaways  


FAQs  

What does “software cheap enough to waste” mean? 
It describes a new phase in software development where AI and automation have made building fast, low-cost, and low risk, allowing teams to experiment more freely and learn faster. 

Why does cheaper software matter for innovation? 
When building is inexpensive, experimentation becomes affordable. Teams can test more ideas, learn from data, and refine products that actually work for people. 

How does this change ROI in software development? 
Traditional ROI measured delivery and cost efficiency. Evidential ROI measures learning, outcomes, and validated impact, value that grows with each iteration. 

What are Return on Learning and Return on Ecosystem? 
Return on Learning measures how quickly teams adapt and improve through cycles of experimentation. Return on Ecosystem measures how insights spread and create shared success across teams. 

What’s the main takeaway for leaders? 
AI and automation have changed the rules. The winners will be those who learn the fastest, not those who build the most. 

Beyond Wrappers: What Protocols Leave Unsolved in AI Systems 

I recently built a Model Context Protocol (MCP) integration for my Oura Ring. Not because I needed MCP, but because I wanted to test the hype: Could an AI agent make sense of my sleep and recovery data? 

It worked. But halfway through I realized something. I could have just used the Oura REST API directly with a simple wrapper. What I ended up building was basically the same thing, just with extra ceremony. 

As someone who has architected enterprise AI systems, I understand the appeal. Reliability isn’t optional, and protocols like MCP promise standardization. To be clear, MCP wasn’t designed to fix hallucinations or context drift. It’s a coordination protocol. But the experiment left me wondering: Are we solving the real problems or just adding layers? 

The Wrapper Pattern That Won’t Go Away 

MCP joins a long list of frameworks like LangChain, LangGraph, SmolAgents, and LlamaIndex, each offering a slightly different spin on coordination. But at heart, they’re all wrappers around the same issue, getting LLMs to use tools consistently. 

Take CrewAI. On paper, it looked elegant with agents organized into “crews,” each with roles and tools. The demos showed frictionless orchestration. In practice? The agents ignored instructions, produced invalid JSON even after careful prompting, and burned days in debugging loops. When I dropped down to a lower-level tool like LangGraph, the problems vanished. CrewAI’s middleware hadn’t added resilience, it had hidden the bugs. 

This isn’t an isolated frustration. Billions of dollars are flowing into frameworks while fundamentals like building reliable agentic systems remain unsettled. MCP risks following the same path. Standardizing communication may sound mature, but without solving hallucinations and context loss, it’s just more scaffolding on shaky foundations. 

What We’re Not Solving 

The industry has been busy launching integration frameworks, yet the harder challenges remain stubbornly in place: 

As CData notes, these aren’t just implementation gaps. They’re fundamental challenges. 

What the Experiments Actually Reveal 

Working with MCP brought a sharper lesson. The difficulty isn’t about APIs or data formats. It’s about reliability and security. 

When I connected my Oura data, I was effectively giving an AI agent access to intimate health information. MCP’s “standardization” amounted to JSON-RPC endpoints. That doesn’t address the deeper issue: How do you enforce “don’t share my health data” in a system that reasons probabilistically? 

To be fair, there’s progress. Auth0 has rolled out authentication updates, and Anthropic has improved Claude’s function-calling reliability. But these are incremental fixes. They don’t resolve the architectural gap that protocols alone can’t bridge. 

The Evidence Is Piling Up 

The risks aren’t theoretical anymore. Security researchers keep uncovering cracks

Meanwhile, fragmentation accelerates. Merge.dev lists half a dozen MCP alternatives. Zilliz documents the “Great AI Agent Protocol Race.” Every new protocol claims to patch what the last one missed. 

Why This Goes Deeper Than Protocol Wars 

The adoption curve is steep. Academic analysis shows MCP servers grew from around 1,000 early this year to over 14,000 by mid-2025. With $50B+ in AI funding at stake, we’re not just tinkering with middleware; we’re building infrastructure on unsettled ground. 

Protocols like MCP can be valuable scaffolding. Enterprises with many tools and models do need coordination layers. But the real breakthroughs come from facing harder questions head-on: 

These problems exist no matter the protocol. And until they’re addressed, standardization risks becoming a distraction. 

The question isn’t whether MCP is useful; it’s whether the focus on protocol standardization is proportional to the underlying challenges. 

So Where Does That Leave Us? 

There’s nothing wrong with building integration frameworks. They smooth edges and create shared patterns. But we should be honest about what they don’t solve. 

For many use cases, native function calling or simple REST wrappers get the job done with less overhead. MCP helps in larger enterprise contexts. Yet the core challenges, reliability and security, remain active research problems. 

That’s where the true opportunity lies. Not in racing to the next protocol, but in tackling the questions that sit at the heart of agentic systems. 

Protocols are scaffolding. They’re not the main event. 

Learn more about Agentic AI. 

The pace of AI change can feel relentless with tools, processes, and practices evolving almost weekly. We help organizations navigate this landscape with clarity, balancing experimentation with governance, and turning AI’s potential into practical, measurable outcomes. If you’re looking to explore how AI can work inside your organization—not just in theory, but in practice—we’d love to be a partner in that journey. Request a strategy session.  

Stop Measuring AI Success by Lines of Code: The Real ROI is in the Boring Stuff 

The headlines are hard to miss, “AI-powered code generation boosting developer velocity by 30%.” Lines of code written per hour skyrocketing. Teams shipping features faster than ever. 

Yet the most significant returns aren’t showing up in those flashy metrics. The real ROI is emerging in places far less glamorous: the work that usually gets postponed, rushed, or quietly skipped. 

The Quality Underground 

While much attention is placed on code generation speed, something more consequential is happening behind the scenes. AI is proving most valuable when it tackles the tedious but essential work developers often deprioritize. 

Test creation. Documentation updates. Boilerplate scaffolding. The quiet foundations of reliable software. 

When testing becomes easier, teams actually do it. When documentation updates itself, it actually stays current. Organizations using AI-augmented testing report 50% lower costs and 60% faster test cycles¹. That’s more than efficiency. It’s a shift in quality assurance discipline. 

A clear pattern is emerging: the less exciting the task, the greater the AI payoff. 

The Multiplier Effect 

This is where traditional measurements fall short. Counting lines of code tells us little about stability. Shipping features faster is less impressive if those features fail in production. 

By contrast, metrics like test coverage and documentation completeness tell a different story. They reveal AI as a speed accelerator and a quality multiplier. 

Some organizations are already seeing dramatic improvements, with test coverage climbing from 60% to 85%, documentation kept current for the first time in years, and edge cases automatically captured. 

The takeaway is straightforward. AI makes developers quicker, and it makes the software they build more reliable. 

The Tasks That Actually Matter 

Consider the flow of software development. Writing business logic is often the easy part. The heavier lift comes in the margins: building robust test suites, maintaining documentation, handling edge cases thoroughly. 

These are the tasks that are critical for quality, slow to complete, and frequently sacrificed under pressure. They are also the exact tasks where AI thrives. 

Take test generation. Creating comprehensive tests often takes longer than the code itself, demanding developers think through failures and integration scenarios. AI can analyze code patterns, detect gaps, and generate tests that human teams might overlook. The result is not just faster coverage, but broader and more consistent coverage. 

The Measurement Revolution 

This shift creates an opening to rethink how AI success is measured.  Instead of tracking raw velocity, organizations are following quality indicators:  

These indicators surface AI’s true value: not simply producing more code but producing better software. 

The Compound Returns 

Quality improvements have a different kind of payoff: they compound. 

Faster code generation saves time today. Stronger test coverage prevents costly failures tomorrow. Automated documentation will reduce onboarding time next quarter. Better quality controls fuel faster iteration next year. 

Measured through this lens, AI’s impact becomes clearer. A 50% drop in production bugs delivers far greater financial benefit than a 50% increase in code generation speed. 

The Quality Advantage 

Teams focusing here are building something rare: systematic quality improvement woven into the development process itself. 

Others may continue to compete on speed, but organizations that compete on reliability are building resilience. They’re lowering technical debt instead of accumulating it. They’re creating the conditions for sustainable experimentation. 

Over time, that advantage compounds into a moat that’s hard to cross. 

Reframing Success 

When the next report touts impressive AI coding velocity, a different question is worth asking, “What is happening to quality?” 

Because real AI transformation isn’t about developers typing faster. It’s about software that’s more dependable, because the unglamorous work is finally being done. 

Organizations that see this are measuring the right outcomes. They’re finding that the “boring” tasks create the most durable advantages. Those are often the ones that matter most when customers decide whose product they trust. 

The pace of AI change can feel relentless with tools, processes, and practices evolving almost weekly. We help organizations navigate this landscape with clarity, balancing experimentation with governance, and turning AI’s potential into practical, measurable outcomes. If you’re looking to explore how AI can work inside your organization—not just in theory, but in practice—we’d love to be a partner in that journey. Request a strategy session. 

Sources: 

  1. Unisys, ROI of Generative AI in Software Testing, 2024