Jeff Kirk Named Executive Vice President of Applied AI at Robots & Pencils 

From Alexa to Emma, Kirk brings two decades of AI breakthroughs that have reshaped industries. Now he’s powering Robots & Pencils’ rise in the intelligence age. 

Robots & Pencils, an AI-first, global digital innovation firm specializing in cloud-native web, mobile, and app modernization, today announced the executive appointment of Jeff Kirk as Executive Vice President of Applied AI. A seasoned technology leader with a career spanning global agencies, startups, and Fortune 100 enterprises, Kirk steps into this newly created role to accelerate the firm’s AI-first vision and unlock transformative outcomes for clients. As EVP of Applied AI, Kirk will lead the firm’s strategy and delivery of AI-powered and enterprise AI solutions across industries. 

Explore how Robots & Pencils blends science and design to build market leaders. 

Kirk’s track record speaks for itself, with AI breakthroughs that fueled customer engagement and business growth. He founded and scaled Moonshot, an intelligent digital products company later acquired by Pactera, where he spearheaded next-generation experiences in voice, augmented reality, and enterprise digitalization. At Amazon, he served as International Product & Technology Lead for Alexa, driving AI-powered personal assistant expansion to millions of households and users worldwide. Most recently, at bswift, Kirk led AI & Data as VP, delivering conversational AI breakthroughs with the award-winning Emma assistant and GenAI-powered EnrollPro decision support system. 

Across each of these roles runs a common thread. Kirk builds and scales innovations that transform how industries work, creating technologies that move from experimental to essential at breathtaking speed. 

“Jeff has been at the frontier of every major shift in digital innovation,” said Len Pagon, CEO of Robots & Pencils. “From shaping the future of eCommerce and mobile platforms at Brulant and Rosetta, to pioneering global voice AI at Amazon, to launching AI-driven customer experiences at bswift, Jeff has consistently delivered what’s next. He doesn’t just talk about AI. He builds products that millions use every day. With Jeff at the helm of Applied AI, Robots & Pencils is sharpening its challenger edge, helping clients leap ahead while legacy consultancies struggle to catch up. I’m energized by what this means for our clients and inspired by what it means for our people.” 

Across two decades, Kirk has built a reputation for translating complex business requirements into enterprise-grade AI and technology solutions that scale, stick, and generate measurable results. His entrepreneurial mindset and hands-on leadership style uniquely position him to help clients experiment, activate, and operate AI across their businesses. 

“Organizations and their workers are under pressure to innovate on behalf of customers while simultaneously learning to work with a new type of co-worker: artificial intelligence,” said Kirk. “The steps we take together to learn to work differently will lead to the most outsized innovation in our industries. I’m thrilled to join Robots & Pencils to push the boundaries of what’s possible with AI, to deliver outcomes that matter for our clients and their customers, and to create opportunities for our teams to do the most meaningful work of their careers.” 

Kirk began his career at Brulant and Rosetta, where he worked alongside Pagon and other Robots & Pencils’ executive team members, leading engineering and solutions architecture across content, commerce, mobile, and social platforms. His return to the fold marks both a reunion and a reinvention, positioning Robots & Pencils as a leader in applied AI at scale. 

The pace of AI change can feel relentless with tools, processes, and practices evolving almost weekly. We help organizations navigate this landscape with clarity, balancing experimentation with governance, and turning AI’s potential into practical, measurable outcomes. If you’re looking to explore how AI can work inside your organization—not just in theory, but in practice—we’d love to be a partner in that journey. Request an AI briefing.  

The $150K PDF That Nobody Reads: From Research Deliverables to Living Systems 

A product executive slides open her desk drawer. Tucked between old cables and outdated business cards is a thick, glossy report. The binding is pristine, the typography immaculate, the insights meticulously crafted. Six figures well spent, at least according to the invoice. Dust motes catch the light as she lifts it out: a monument to research that shaped… nothing, influenced… no one, and expired the day it was delivered. 

It’s every researcher’s quiet fear. The initiative they poured months of work, a chunk of their sanity, and about a thousand sticky notes into becomes shelf-ware. Just another artifact joining strategy decks and persona posters that never found their way into real decisions. 

This is the way research has been delivered for decades, by global consultancies, boutique agencies, and yes, even by me. At $150K a report, it sounds extravagant. But when you consider the sheer effort, the rarity of the talent involved, and the stakes of anchoring business decisions in real customer insight, it’s not hard to see why leaders sign the check. 

The issue isn’t the value of the research. It’s the belief that insights should live in documents at all. 

Research as a Living System 

Now picture a different moment. The same executive doesn’t reach for a drawer. She opens her laptop and types: “What causes the most friction when ordering internationally?” 

Within seconds she’s reviewing tagged quotes from dozens of interviews, seeing patterns of friction emerge, even testing new messaging against synthesized persona responses. The research isn’t locked in a PDF. It’s alive, queryable, and in motion. 

This isn’t a fantasy. It’s the natural evolution of how research should work: not as one-time deliverables, but as a living system

The numbers show why change is overdue. Eighty percent of Research Ops & UX professionals use some form of research repository, but over half reported fair or poor adoption. The tools are frustrating, time consuming to maintain, and lack ownership. Instead of mining the insights they already have, teams commission new studies, resulting in an expensive cycle of creating artifacts that sit idle, while decisions move on without them. 

It’s a Usability Problem 

Research hasn’t failed because of weak insights. It’s been constrained by the static format of reports. Once findings are bound in a PDF or slide deck, the deliverable has to serve multiple audiences at once, and it starts to bend under its own weight. 

For executives, the executive summary provides a clean snapshot of findings. But when the time comes to make a concrete decision, the summary isn’t enough. They have to dive into the hundred-page appendix to trace back the evidence, which slows down the moment of action. 

On the other hand, product teams don’t need summaries, they need detailed insights for the feature they’re building right now. In long static reports, those details are often buried or disconnected from their workflow. Sometimes they don’t even realize the answer exists at all, so the research goes unused, or even gets repeated. An insight that can’t be surfaced when it’s needed might as well not exist. 

The constraint isn’t the quality of the research. It’s the format. Static deliverables fracture usability across audiences and leave each group working harder than they should to put insights into play. 

Research as a Product 

While we usually view research as an input into products, research itself is a product too. And with a product mindset, there is no “final deliverable,” only an evolving body of user knowledge that grows in value over time. 

In this model, the researcher acts as a knowledge steward of the user insight “product,” curating, refining, and continuously delivering customer insights to their users: the executives, product managers, designers, and engineers who need insights in different forms and at different moments. 

Like any product, research needs a roadmap. It has gaps to fill, like user groups not yet heard from, or behaviors not yet explored. It has features to maintain like transcripts, coded data, and tagged insights. And it has adoption goals, because insights only create value when people use them. 

This approach transforms reports too. A static deck becomes just a temporary framing of the knowledge that already exists in the system. With AI, you can auto-generate the right “version” of research for the right audience, such as an executive summary for the C-suite, annotations on backlog items for product teams, or a user-centered evaluation for design reviews. 

Treating research as a product also opens the door to continuous improvement. A research backlog can track unanswered questions, emerging themes, and opportunities for deeper exploration. Researchers can measure not just delivery (“did we produce quality insights?”) but usage (“did the insights influence a decision?”). Over time, the research “product” compounds in value, becoming a living, evolving system rather than a series of static outputs. 

This new model requires a new generation of tools. AI can now cluster themes, surface patterns, simulate persona responses, and expose insights through natural Q&A. AI makes the recomposition of insights into deliverables cheap. That allows us to focus on how our users get the insights they need in the way they need them. 

From Deliverable to Product 

Treating research as a product changes the central question. It’s no longer, “What should this report contain?” but “What questions might stakeholders need to answer, and how do we make those answers immediately accessible?” 

When research is built for inquiry, every transcript, survey, and usability session becomes part of a living knowledge base that compounds in value over time. Success shifts too: not in the number of reports delivered, but in how often insights are pulled into decisions. A six-figure investment should inform hundreds of critical choices, not one presentation that fades into archives. 

And here’s the irony: the product mindset actually produces better reports as well. When purpose-built reports focus as much on their usage as the information they contain, they become invaluable components of the software production machine. 

Research itself isn’t broken. It just needs a product mindset and AI-based qualitative analysis tools that turns insights into a living system, not a slide deck. 

The pace of AI change can feel relentless with tools, processes, and practices evolving almost weekly. We help organizations navigate this landscape with clarity, balancing experimentation with governance, and turning AI’s potential into practical, measurable outcomes. If you’re looking to explore how AI can work inside your organization—not just in theory, but in practice—we’d love to be a partner in that journey. Request a strategy session.  

How Agentic AI Is Rewiring Higher Education 

A University Without a Nervous System 

Walk through the back offices of most universities, and you will see the challenge. Admissions runs on one platform, advising on another, learning management on a third, and academic affairs on a fourth. Each system functions, yet little connects them. Students feel the gaps when financial aid processing is delayed, academic records are incomplete, and support processes remain confusing and slow. Leaders feel it in the cost of complexity and the weight of compliance. 

Higher education institutions typically manage dozens of disconnected systems, with IT leaders facing persistent integration challenges that consume substantial staff time and budget resources while creating operational bottlenecks that affect both student services and institutional agility. 

For decades, CIOs and CTOs have been tasked with stitching these systems together. Progress came in patches, with integrations here and dashboards there. What emerged looked more like scar tissue than connective tissue. Patchwork technology blocks digital transformation in higher education, and leaders now seek infrastructure that can unify rather than just connect. 

The Rise of Agentic AI as Connective Tissue 

Agentic AI wires the university together. Acting like a nervous system, it routes information and triggers actions throughout the institution, coordinating workflows through intelligent routing and contextual decision-making. Unlike traditional automation that follows rigid rules, agentic AI systems can make contextual decisions, learn from outcomes, and coordinate across multiple platforms without constant human oversight. 

In practice, this means a transfer request automatically verifies transcripts through the National Student Clearinghouse, cross-references degree requirements in the SIS, flags discrepancies for staff to review, and updates student records, typically reducing processing time from 5-7 days to under 24 hours while maintaining accuracy. It means an advising system can recognize a retention risk, trigger outreach, and log the interaction without human staff piecing the puzzle together by hand. 

Agentic AI needs a strong foundation. That foundation is cloud-native infrastructure for universities that’s built to scale during peak demand, enforce compliance, and keep every action visible. With this base in place, universities move from pilot projects to production systems. The result is infrastructure that holds under pressure and adapts when conditions change. 

The Brain Still Decides 

A nervous system does not think on its own. It carries signals to the brain, where decisions are made. In the university context the brain is still human, made up of faculty, advisors, administrators, and executives. 

This is where the design philosophy matters. Agentic AI should amplify human capacity, not replace it. Advisors can spend more time in meaningful conversations with students because degree audits and schedule planning run on their own. CIOs can focus on strategic alignment because monitoring and audit logs are captured automatically. The architecture creates space for judgment, and it also creates space for human connection that strengthens the student experience. 

However, this transition requires careful change management. Faculty often express concerns about AI decision-making transparency, while staff worry about job displacement. Successful implementations address these concerns through clear governance frameworks, explainable AI requirements, and retraining programs that position staff as AI supervisors rather than replacements. 

What Happens When Signals Flow Freely 

When agentic systems begin to carry the load, universities see a different rhythm. Transcript processing moves with speed. Advising interactions trigger at the right time. Students find support without friction. Leaders gain resilience as workflows carry themselves from start to finish. What emerges is more than efficiency. It is an institution that thinks and acts as one, with every part working in concert to support the student journey. 

Designing for Resilience and Trust 

CIOs and CTOs recognize that orchestration brings new responsibility. Data must be structured and governed, with student information requiring FERPA compliant handling throughout all automated processes. Agents must be observable and auditable. Compliance cannot live as a separate checklist but as a property of the system itself. AWS-native controls, from encryption to identity management, provide the levers to design with security as a default rather than a bolt-on. 

At the same time, leaders must design for operational trust. A nervous system functions only when signals are reliable. This requires real-time monitoring dashboards, clear escalation protocols when agents encounter exceptions, and audit trails that document every automated decision. 

The Next Chapter of Higher Education Infrastructure 

What is happening now is less about another wave of apps and more about a shift in the foundation of the institution. Agentic AI is beginning to operate as infrastructure. It connects the university’s digital systems into something coordinated and adaptive. 

The role of leadership is to decide how that nervous system will function, and what kind of human judgment it will amplify. Presidents, provosts, CIOs, and CTOs who recognize this shift will shape not only the student experience but the operational resilience of their institutions for years to come. 

For leaders evaluating agentic AI initiatives, three factors determine readiness.  

Institutions strong in all three areas see faster implementation and higher adoption rates. 

The institutions that succeed will be those that view agentic AI not as a technology project, but as an organizational transformation requiring new governance models, staff capabilities, and student engagement strategies. 

When the nervous system works, the signals move freely, and people do their best work. Students find support when they need it. Advisors focus on real conversations. Leaders see further ahead. That is the promise of agentic AI in higher education, not machines in charge, but machines carrying the load so people can do what only people can do. 

Join Us

Join us at ASU’s Agentic AI and the Student Experience conference. Contact us to book time with our leaders and explore how agentic AI can strengthen your institution. 

Request an AI Briefing.  

The pace of AI change can feel relentless with tools, processes, and practices evolving almost weekly. We help organizations navigate this landscape with clarity, balancing experimentation with governance, and turning AI’s potential into practical, measurable outcomes. If you’re looking to explore how AI can work inside your organization—not just in theory, but in practice—we’d love to be a partner in that journey. Learn more about Robots & Pencils AI Solutions for Education. 

Beyond Wrappers: What Protocols Leave Unsolved in AI Systems 

I recently built a Model Context Protocol (MCP) integration for my Oura Ring. Not because I needed MCP, but because I wanted to test the hype: Could an AI agent make sense of my sleep and recovery data? 

It worked. But halfway through I realized something. I could have just used the Oura REST API directly with a simple wrapper. What I ended up building was basically the same thing, just with extra ceremony. 

As someone who has architected enterprise AI systems, I understand the appeal. Reliability isn’t optional, and protocols like MCP promise standardization. To be clear, MCP wasn’t designed to fix hallucinations or context drift. It’s a coordination protocol. But the experiment left me wondering: Are we solving the real problems or just adding layers? 

The Wrapper Pattern That Won’t Go Away 

MCP joins a long list of frameworks like LangChain, LangGraph, SmolAgents, and LlamaIndex, each offering a slightly different spin on coordination. But at heart, they’re all wrappers around the same issue, getting LLMs to use tools consistently. 

Take CrewAI. On paper, it looked elegant with agents organized into “crews,” each with roles and tools. The demos showed frictionless orchestration. In practice? The agents ignored instructions, produced invalid JSON even after careful prompting, and burned days in debugging loops. When I dropped down to a lower-level tool like LangGraph, the problems vanished. CrewAI’s middleware hadn’t added resilience, it had hidden the bugs. 

This isn’t an isolated frustration. Billions of dollars are flowing into frameworks while fundamentals like building reliable agentic systems remain unsettled. MCP risks following the same path. Standardizing communication may sound mature, but without solving hallucinations and context loss, it’s just more scaffolding on shaky foundations. 

What We’re Not Solving 

The industry has been busy launching integration frameworks, yet the harder challenges remain stubbornly in place: 

As CData notes, these aren’t just implementation gaps. They’re fundamental challenges. 

What the Experiments Actually Reveal 

Working with MCP brought a sharper lesson. The difficulty isn’t about APIs or data formats. It’s about reliability and security. 

When I connected my Oura data, I was effectively giving an AI agent access to intimate health information. MCP’s “standardization” amounted to JSON-RPC endpoints. That doesn’t address the deeper issue: How do you enforce “don’t share my health data” in a system that reasons probabilistically? 

To be fair, there’s progress. Auth0 has rolled out authentication updates, and Anthropic has improved Claude’s function-calling reliability. But these are incremental fixes. They don’t resolve the architectural gap that protocols alone can’t bridge. 

The Evidence Is Piling Up 

The risks aren’t theoretical anymore. Security researchers keep uncovering cracks

Meanwhile, fragmentation accelerates. Merge.dev lists half a dozen MCP alternatives. Zilliz documents the “Great AI Agent Protocol Race.” Every new protocol claims to patch what the last one missed. 

Why This Goes Deeper Than Protocol Wars 

The adoption curve is steep. Academic analysis shows MCP servers grew from around 1,000 early this year to over 14,000 by mid-2025. With $50B+ in AI funding at stake, we’re not just tinkering with middleware; we’re building infrastructure on unsettled ground. 

Protocols like MCP can be valuable scaffolding. Enterprises with many tools and models do need coordination layers. But the real breakthroughs come from facing harder questions head-on: 

These problems exist no matter the protocol. And until they’re addressed, standardization risks becoming a distraction. 

The question isn’t whether MCP is useful; it’s whether the focus on protocol standardization is proportional to the underlying challenges. 

So Where Does That Leave Us? 

There’s nothing wrong with building integration frameworks. They smooth edges and create shared patterns. But we should be honest about what they don’t solve. 

For many use cases, native function calling or simple REST wrappers get the job done with less overhead. MCP helps in larger enterprise contexts. Yet the core challenges, reliability and security, remain active research problems. 

That’s where the true opportunity lies. Not in racing to the next protocol, but in tackling the questions that sit at the heart of agentic systems. 

Protocols are scaffolding. They’re not the main event. 

Learn more about Agentic AI. 

The pace of AI change can feel relentless with tools, processes, and practices evolving almost weekly. We help organizations navigate this landscape with clarity, balancing experimentation with governance, and turning AI’s potential into practical, measurable outcomes. If you’re looking to explore how AI can work inside your organization—not just in theory, but in practice—we’d love to be a partner in that journey. Request a strategy session.  

Stop Measuring AI Success by Lines of Code: The Real ROI is in the Boring Stuff 

The headlines are hard to miss, “AI-powered code generation boosting developer velocity by 30%.” Lines of code written per hour skyrocketing. Teams shipping features faster than ever. 

Yet the most significant returns aren’t showing up in those flashy metrics. The real ROI is emerging in places far less glamorous: the work that usually gets postponed, rushed, or quietly skipped. 

The Quality Underground 

While much attention is placed on code generation speed, something more consequential is happening behind the scenes. AI is proving most valuable when it tackles the tedious but essential work developers often deprioritize. 

Test creation. Documentation updates. Boilerplate scaffolding. The quiet foundations of reliable software. 

When testing becomes easier, teams actually do it. When documentation updates itself, it actually stays current. Organizations using AI-augmented testing report 50% lower costs and 60% faster test cycles¹. That’s more than efficiency. It’s a shift in quality assurance discipline. 

A clear pattern is emerging: the less exciting the task, the greater the AI payoff. 

The Multiplier Effect 

This is where traditional measurements fall short. Counting lines of code tells us little about stability. Shipping features faster is less impressive if those features fail in production. 

By contrast, metrics like test coverage and documentation completeness tell a different story. They reveal AI as a speed accelerator and a quality multiplier. 

Some organizations are already seeing dramatic improvements, with test coverage climbing from 60% to 85%, documentation kept current for the first time in years, and edge cases automatically captured. 

The takeaway is straightforward. AI makes developers quicker, and it makes the software they build more reliable. 

The Tasks That Actually Matter 

Consider the flow of software development. Writing business logic is often the easy part. The heavier lift comes in the margins: building robust test suites, maintaining documentation, handling edge cases thoroughly. 

These are the tasks that are critical for quality, slow to complete, and frequently sacrificed under pressure. They are also the exact tasks where AI thrives. 

Take test generation. Creating comprehensive tests often takes longer than the code itself, demanding developers think through failures and integration scenarios. AI can analyze code patterns, detect gaps, and generate tests that human teams might overlook. The result is not just faster coverage, but broader and more consistent coverage. 

The Measurement Revolution 

This shift creates an opening to rethink how AI success is measured.  Instead of tracking raw velocity, organizations are following quality indicators:  

These indicators surface AI’s true value: not simply producing more code but producing better software. 

The Compound Returns 

Quality improvements have a different kind of payoff: they compound. 

Faster code generation saves time today. Stronger test coverage prevents costly failures tomorrow. Automated documentation will reduce onboarding time next quarter. Better quality controls fuel faster iteration next year. 

Measured through this lens, AI’s impact becomes clearer. A 50% drop in production bugs delivers far greater financial benefit than a 50% increase in code generation speed. 

The Quality Advantage 

Teams focusing here are building something rare: systematic quality improvement woven into the development process itself. 

Others may continue to compete on speed, but organizations that compete on reliability are building resilience. They’re lowering technical debt instead of accumulating it. They’re creating the conditions for sustainable experimentation. 

Over time, that advantage compounds into a moat that’s hard to cross. 

Reframing Success 

When the next report touts impressive AI coding velocity, a different question is worth asking, “What is happening to quality?” 

Because real AI transformation isn’t about developers typing faster. It’s about software that’s more dependable, because the unglamorous work is finally being done. 

Organizations that see this are measuring the right outcomes. They’re finding that the “boring” tasks create the most durable advantages. Those are often the ones that matter most when customers decide whose product they trust. 

The pace of AI change can feel relentless with tools, processes, and practices evolving almost weekly. We help organizations navigate this landscape with clarity, balancing experimentation with governance, and turning AI’s potential into practical, measurable outcomes. If you’re looking to explore how AI can work inside your organization—not just in theory, but in practice—we’d love to be a partner in that journey. Request a strategy session. 

Sources: 

  1. Unisys, ROI of Generative AI in Software Testing, 2024 

Beyond Story Points: Rethinking Software Engineering Productivity in the Age of AI 

Why traditional metrics fall short, and how modern frameworks like DORA and SPACE can guide better outcomes 

For years, engineering leaders have relied on familiar metrics to gauge developer performance: story points, bug counts, and lines of code. These measures offered a shared baseline, especially in Agile environments where estimation and output needed a common language. 

But in today’s AI-assisted world, those numbers no longer tell the full story. Performance isn’t just about volume or velocity. It’s about outcomes. Did the developer deliver the expected functionality, with the right quality, on time? That’s how we compensate today, and that’s still what matters. But how we measure those things must evolve.  

With tools like GitHub Copilot, Claude Code, and Cursor generating entire functions, tests, and documentation quickly, output is becoming less about what a developer types and more about what they model, validate, and evolve. 

The challenge for CIOs, CTOs, and SVPs of Engineering isn’t just adopting new tools. It’s rethinking how to measure effectiveness in a world where productivity is amplified by AI and complexity often hides behind automation. 

Why Traditional Metrics Break Down 

The future of measurement hinges on three categories: productivity, quality, and functionality. These have always been essential to evaluating engineering work. But in the AI era, we must measure them differently. That shift doesn’t mean abandoning objectivity; it means updating our tools. 

The problem isn’t that legacy metrics are useless. It’s that they’re easily gamed, misinterpreted, or disconnected from business value. 

At best, these metrics create noise. At worst, they drive harmful incentives, like rewarding speed over safety, or activity over alignment. 

Today’s AI-assisted workflows lack mature solutions for tracking whether functionality requirements, like EPICs and user stories, have been fully met. But new approaches, like multi-domain linking (MDL), are emerging to close that gap. Measurement is getting smarter, and more connected, because it has to. 

The Rise of Directional Metrics 

Modern frameworks like DORA and SPACE were built to address these gaps. 

DORA (DevOps Research and Assessment) focuses on: 

These measure delivery health, not just effort. They’re useful for understanding how efficiently and safely value reaches users. 

SPACE (developed by Microsoft Research) considers: 

SPACE offers a more holistic view, especially in cross-functional and AI-assisted teams. It acknowledges that psychological safety, cross-team communication, and real flow states often impact long-term output more than individual commits. 

AI Complicates the Picture 

AI tools don’t eliminate the need for metrics; they demand smarter ones. When an LLM can write 80% of the code for a feature, how do we credit the developer? By the number of keystrokes? Or by their judgment in prompting, curating, and validating what the tool produced? 

But here’s the deeper challenge: What if that feature doesn’t do what it was supposed to? 

In AI-assisted workflows: 

Productivity isn’t just about output; it’s about fitness to purpose. Without strong traceability between code, tests, user stories, and epics, it’s easy for teams to ship fast but fall short of the business goal. 

Many organizations today struggle to answer a basic question: Did this delivery actually fulfill the intended functionality? 

This is where multi-domain linking (MDL) and AI-powered traceability show promise. By connecting user stories, requirements, test cases, design artifacts, and even user feedback within a unified graph, teams can use LLMs to assess whether the output truly matches the input. 

And this capability unlocks more than just better alignment, it opens the door to innovation. AI-assisted development enables organizations to build more complex, interconnected, and adaptive systems than ever before. As those capabilities expand, so too must our ability to measure their economic value. What applications can we now build that we couldn’t before? And what is that worth to the business? 

That’s not a theoretical exercise. It’s the next frontier in engineering measurement. 

Productivity as a System, Not a Score 

The best engineering organizations treat productivity like instrumentation. No single number can tell you what’s working, but the right mix of signals can guide better decisions. That system must account for both delivery efficiency and functional alignment. High velocity is meaningless if the outcome doesn’t meet the requirements it was designed to fulfill. 

That means: 

Most importantly, it means aligning measurement to what matters: Did the product deliver value? Did it meet its intended function? Was the effort worth the outcome? Those are the questions that still define success and the ones our measurement frameworks must help answer. 

How to Start Rethinking Measurement 

If your metrics haven’t evolved alongside your tooling, here’s how to get started: 

AI is reshaping how software gets built. That doesn’t mean productivity can’t be measured. It means it must be measured differently. The leaders who shift from tracking motion to monitoring momentum will build faster, healthier, and more resilient engineering teams. 

Robots & Pencils: Measuring What Matters in an AI-Driven World 

At Robots & Pencils, we believe productivity isn’t a score; it’s a system. A system that must measure not just speed, but alignment. Did the output meet the requirements? Did it fulfill the epic? Was the intended functionality delivered? 

We help clients extend traditional measurement approaches to fit an AI-first world. That means combining DORA and SPACE metrics with functional traceability, such as linking code to requirements, outcomes to epics, and user stories to business results. 

Our secure, AWS-native platforms are already instrumented for this kind of visibility. And our teams are actively designing multi-domain models that give leaders better answers to the questions they care about most. 

As AI opens the door to applications we never thought were possible, our job is to help you measure what matters, including what’s newly possible. We don’t just help teams move faster. We help them build with confidence and prove it. 

Pilot, Protect, Produce: A CIO’s Guide to Adopting AI Code Tools 

How to responsibly explore tools like GitHub Copilot, Claude Code, and Cursor—without compromising privacy, security, or developer trust 

AI-assisted development isn’t a future state. It’s already here. Tools like GitHub Copilot, Claude Code, and Cursor are transforming how software gets built, accelerating boilerplate, surfacing better patterns, and enabling developers to focus on architecture and logic over syntax and scaffolding. 

The productivity upside is real. But so are the risks. 

For CIOs, CTOs, and senior engineering leaders, the challenge isn’t whether to adopt these tools—it’s how. Because without the right strategy, what starts as a quick productivity gain can turn into a long-term governance problem. 

Here’s how to think about piloting, protecting, and operationalizing AI code tools so you move fast, without breaking what matters. 

Why This Matters Now 

In a recent survey of more than 1,000 developers, 81% of engineers reported using AI assistance in some form, and 49% reported using AI-powered coding assistants daily. Adoption is happening organically, often before leadership even signs off. The longer organizations wait to establish usage policies, the more likely they are to lose visibility and control. 

On the other hand, overly restrictive mandates risk boxing teams into tools that may not deliver the best results and limit experimentation that could surface new ways of working. 

This isn’t just a tooling decision. It’s a cultural inflection point. 

Understand the Risk Landscape 

Before you scale any AI-assisted development program, it’s essential to map the risks: 

These aren’t reasons to avoid adoption. But they are reasons to move intentionally with the right boundaries in place. 

Protect First: Establish Clear Guardrails 

Protect First: Establish Clear Guardrails 

A successful AI coding tool rollout begins with protection, not just productivity. As developers begin experimenting with tools like Copilot, Claude, and Cursor, organizations must ensure that underlying architectures and usage policies are built for scale, compliance, and security. 

Consider: 

For teams ready to push further, Bedrock AgentCore offers a secure, modular foundation for building scalable agents with memory, identity, sandboxed execution, and full observability, all inside AWS. Combined with S3 Vector Storage, which brings native embedding storage and cost-effective context management, these tools unlock a secure pathway to more advanced agentic systems. 

Most importantly, create an internal AI use policy tailored to software development. It should define tool approval workflows, prompt hygiene best practices, acceptable use policies, and escalation procedures when unexpected behavior occurs. 

These aren’t just technical recommendations, they’re prerequisites for building trust and control into your AI adoption journey. 

Pilot Intentionally 

Start with champion teams who can balance experimentation with critical evaluation. Identify low-risk use cases that reflect a variety of workflows: bug fixes, test generation, internal tooling, and documentation. 

Track results across three dimensions: 

Encourage developers to contribute usage insights and prompt examples. This creates the foundation for internal education and tooling norms. 

Don’t Just Test—Teach 

AI coding tools don’t replace development skills; they shift where those skills are applied. Prompt engineering, semantic intent, and architectural awareness become more valuable than line-by-line syntax. 

That means education can’t stop with the pilot. To operationalize safely: 

When used well, these tools amplify good developers. When used poorly, they obscure problems and inflate false productivity. Training is what makes the difference. 

Produce with Confidence 

Once you’ve piloted responsibly and educated your teams, you’re ready to operationalize with confidence. That means: 

Organizations that do this well won’t just accelerate development, they’ll build more resilient software teams. Teams that understand both what to build and how to orchestrate the right tools to do it. The best engineering leaders won’t mandate one AI tool or ban them altogether. They’ll create systems that empower teams to explore safely, evaluate critically, and build smarter together. 

Robots & Pencils: Secure by Design, Built to Scale 

At Robots & Pencils, we help enterprise engineering teams pilot AI-assisted development with the right mix of speed, structure, and security. Our preferred LLM, Anthropic, was chosen precisely because we prioritize data privacy, source integrity, and ethical model design; values we know matter to our clients as much as productivity gains. 

We’ve been building secure, AWS-native solutions for over a decade, earning recognition as an AWS Partner with a Qualified Software distinction. That means we meet AWS’s highest standards for reliability, security, and operational excellence while helping clients adopt tools like Copilot, Claude Code, and Cursor safely and strategically. 

We don’t just plug in AI; we help you govern it, contain it, and make it work in your world. From guardrails to guidance, we bring the technical and organizational design to ensure your AI tooling journey delivers impact without compromise. 

The Changing Role of the Computer Programmer 

How generative AI, cloud-native services, and intelligent orchestration are redefining the developer role and what it means for modern engineering teams 

In the early days of computing, programmers were indispensable because they were the only ones who could speak the language of machines. From punch cards to assembly language, software development was hands-on and highly specialized. Even as languages evolved, from COBOL and C to Java and C#, one thing stayed constant: developers wrote every line themselves. 

But that’s no longer true. And it hasn’t been for a while. 

Today, enterprise developers have access to an entirely new class of tools: generative AI, intelligent agents, and secure, cloud-native building blocks that reduce the need to write, or even see, large amounts of code. This shift isn’t superficial. It’s redefining the nature of software development itself. 

A recent Cornell University study reports that AI now generates at least 30% of Python code in major repositories in the U.S. And in enterprise environments at Google and Microsoft, 30–40% of new code is reported as AI-generated. That’s not a tweak in tooling. That’s a turning point in how software gets built. 

From Code to Composition 

For decades, the dominant paradigm in programming was one of writing: the developer’s job was to build logic from scratch, test it for accuracy, and ensure it could scale. As complexity grew, so did the stack of tools, including IDEs, frameworks, QA platforms, and versioning systems to support that work. 

But in the last few years, the developer toolbox has changed dramatically. Tools like GitHub Copilot, Claude Code, and Cursor now generate reliable code in real time. Entire modules can be scaffolded with a few prompts. Meanwhile, cloud platforms like AWS offer modular services that handle everything from authentication to observability out of the box. 

The result? Developers are shifting from authors to orchestrators. The value isn’t in how much code they can write; it’s in how well they can assemble, adapt, and govern systems that are increasingly AI-enabled, cloud-native, and composable. 

Productivity and Quality are Improving, but are We Building the Right Thing? 

AI-assisted development produces measurable gains. Code is being written faster. Boilerplate is disappearing. Bugs are easier to catch early. Even tests can be autogenerated. And yet, one challenge persists: verifying that the right thing is being built. 

It’s relatively straightforward to measure productivity (lines of code, lead time) and quality (bug rates, test coverage). But ensuring correct functionality, such as matching what’s shipped to product requirements, user stories, and EPICs, is harder than ever. Code generation tools accelerate output, but they don’t always ensure alignment with intent. 

That’s why the developer’s role is expanding. Understanding product vision, aligning technical architecture with business goals, and managing evolving requirements are becoming just as critical as technical skill. 

What Should Engineering Leaders Expect from Modern Developers? 

The pace of innovation in AI development tools is relentless. What a developer learns today may be outdated in a few months. This puts enormous pressure on engineering leaders to balance experimentation with sustainability. 

The safest path forward? Anchor learning and experimentation within robust cloud ecosystems. AWS, for instance, offers stable development trajectories, strong security guardrails, and continuous improvements that minimize disruption. The goal isn’t to chase every new tool; it’s to build foundational fluency and adapt deliberately. 

To succeed in this new environment, developers must think differently: 

Code Isn’t Dead, but It’s Being Delegated 

Let’s be clear: programming isn’t going away. But its role is evolving. The most impactful developers won’t be those who write the most lines of code, they’ll be the ones who know how to compose, configure, and coordinate intelligent systems with speed and confidence. 

They’ll use prompts, ontologies, and models as naturally as they once used loops and conditionals. They’ll know when to generate, when to review, and when to intervene. And they’ll be deeply embedded in outcome-oriented thinking. 

What Should Engineering Leaders Do Next? 

As the role of the programmer changes, so too must the systems that support them. This means: 

The ground is shifting. But for organizations willing to embrace this change, the opportunity is enormous: faster iteration, stronger alignment, and more resilient systems—built by developers who think in outcomes, not just code. 

Robots & Pencils: Redefining the Role, Rebuilding the Foundation 

At Robots & Pencils, we’ve spent over a decade helping organizations adapt to shifts in software architecture and engineering practice. As developers move from coding line-by-line to orchestrating intelligent, cloud-native systems, our role is to help them and their leaders make that leap with confidence. 

We design secure, cloud-native environments that empower developers to compose, not just code. With Anthropic as our preferred LLM and a track record of building modular, scalable solutions, we give teams the foundation they need to experiment responsibly, build faster, and deliver more value without compromising on security or quality. 

For teams rethinking what it means to “write software,” we bring the expertise, architecture, and systems design to make the next role of the developer a strength, not a risk. 

Patrick Higgins Named Chief Revenue Officer at Robots & Pencils

From IBM transformation to AI-powered product strategy, Higgins brings proven enterprise expertise and client-first vision to fuel the firm’s next phase of commercial expansion

Robots & Pencils, an AI-first, global digital innovation firm specializing in cloud-native web, mobile, and app modernization, today announced the appointment of Patrick Higgins as Chief Revenue Officer (CRO). A seasoned technology leader with over 15 years of experience driving digital innovation for Fortune 500 companies, Higgins steps into a pivotal role to deepen client partnerships, scale impact, and fuel the company’s next phase of growth.

Higgins built his career at the intersection of digital product development, enterprise transformation, and applied AI. He began at IBM delivering mission-critical programs for large government and healthcare clients. Higgins then spent nearly a decade at WillowTree, where he helped scale the firm into a full-service digital agency and led go-to-market efforts across Media, Healthcare, and most recently, AI. Over the past year alone, he has advised more than 80 organizations on how to turn AI ambitions into action through strategic governance, rapid prototyping, and practical deployment strategies.

Now, as CRO at Robots & Pencils, Higgins will lead all commercial operations, with a focus on aligning strategy, sales, and client partnerships to help organizations unlock the full potential of AI, cloud-native architecture, and next-generation experiences.

“Patrick’s ability to listen deeply, build trust, and connect business goals to technical outcomes is exceptional,” said Leonard Pagon, CEO of Robots & Pencils. “He doesn’t just understand AI—he understands how to activate it inside the enterprise. He’s helped clients across industries turn emerging tech into scalable solutions, and his presence here marks a key step in our evolution. We’re not chasing growth for growth’s sake—we’re scaling the way we serve our clients. Patrick is the right leader to ensure that growth stays grounded in trust, results, and partnership.”

Higgins joins a growing executive team committed to challenging the traditional global systems integrators with a model that prioritizes speed, strategy, and elite delivery. With global centers of excellence and strategic partnerships with AWS, Salesforce, and Databricks, Robots & Pencils is positioned to help clients move beyond experimentation and into meaningful, AI-infused transformation.

“What drew me to Robots & Pencils is the caliber of the team and the clarity of the mission,” said Higgins. “We’re not just talking about AI. We’re delivering it—wrapped in thoughtful design, modern cloud infrastructure, and agile engineering. This is a firm built to move fast and deliver real results, and I’m honored to help lead the next chapter.”

In addition to his leadership role, Higgins is an active contributor to the AI community, serving as a panelist for the University of Virginia’s AI initiatives and helping organizations demystify their path to innovation. He holds a BA and MBA from the University of Virginia and lives in Charlottesville with his family.

Context Is King: How AWS & Anthropic Are Redefining AI Utility with MCP 

If AI is going to work at scale, it needs more than a model; it needs access, structure, and purpose. 

At the AWS Summit in New York City, one phrase stuck with us: 

 “Models are only as good as the context they’re given.” 

It came during an insightful joint session from AWS and Anthropic on Model Context Protocol (MCP), a deceptively simple concept with massive implications. 

Across this recap series, we’ve explored the rise of agentic AI, the infrastructure required to support it, and the ecosystem AWS is building to accelerate adoption. MCP is the connective tissue that brings it all together. It’s how you move from smart models to useful systems. 

Why Context Is the New Bottleneck 

Generative AI has been evolving fast, but enterprise implementation is still slow. Why? 

Because no matter how advanced your model is, it can’t help you make better decisions if it’s not connected to what makes your business unique: Your data. Your tools. Your systems. Your users. 

That’s where MCP comes in. 

What Is MCP—and Why It Matters 

Model Context Protocol (MCP) is a specification that allows AI models to dynamically discover and interact with third-party tools, data sources, and instructions. Think of it as a structured interface—where systems publish a list of tools, what they do, the inputs they require, and how the model should use them. 

For executives, that means your AI agents can tap into real business logic—not by guessing, but by calling documented resources your teams control. For engineers, it means you can expose functions, services, or datasets via an MCP server, enabling LLMs to perform meaningful actions without hardcoding every step. 

The result? AI that doesn’t just respond—it executes, using tools it finds and understands in real time. 

With MCP, you can: 

In short: MCP allows generative AI to break free of the chat window and take real-world action.  

Real Integration, Not Just Model Tuning 

With MCP servers already available in AWS, your teams can start building agentic AI products that can utilize your unique business logic, customer data, and internal systems. This isn’t hypothetical. It’s real and ready to deploy today. 

At Robots & Pencils, we’re already using this pattern with our clients: 

We call this approach Emergent Experience Design, a framework for building systems where agents adapt, interfaces evolve, and outcomes unfold through interaction. If you’re rethinking UX in the age of AI, this is where to start. 

And when you combine this with what we covered in The Future Is Agentic, Modernization Reloaded, and From AI to Execution, you start to see the bigger picture: Agentic AI isn’t just a new model. It’s a new way of working. And context is the infrastructure it runs on. 

Plug AI into the Business, Not Just the Cloud 

The hype phase of generative AI is behind us. What matters now is how well your systems can support intelligent action. If you want AI that drives real outcomes, you don’t just need better models. You need better context. That’s the promise of MCP—and the opportunity ahead for organizations ready to take the next step. 

If you’re experimenting with GenAI and want to connect it to your real-world data and systems, we should talk.