September 10, 2025

Jeff Kirk Named Executive Vice President of Applied AI at Robots & Pencils

Robots & Pencils

From Alexa to Emma, Kirk brings two decades of AI breakthroughs that have reshaped industries. Now he’s powering Robots & Pencils’ rise in the intelligence age.

Robots & Pencils, an AI-first, global digital innovation firm specializing in cloud-native web, mobile, and app modernization, today announced the executive appointment of Jeff Kirk as Executive Vice President of Applied AI. A seasoned technology leader with a career spanning global agencies, startups, and Fortune 100 enterprises, Kirk steps into this newly created role to accelerate the firm’s AI-first vision and unlock transformative outcomes for clients. As EVP of Applied AI, Kirk will lead the firm’s strategy and delivery of AI-powered and enterprise AI solutions across industries.

Explore how Robots & Pencils blends science and design to build market leaders.

Kirk’s track record speaks for itself, with AI breakthroughs that fueled customer engagement and business growth. He founded and scaled Moonshot, an intelligent digital products company later acquired by Pactera, where he spearheaded next-generation experiences in voice, augmented reality, and enterprise digitalization. At Amazon, he served as International Product & Technology Lead for Alexa, driving AI-powered personal assistant expansion to millions of households and users worldwide. Most recently, at bswift, Kirk led AI & Data as VP, delivering conversational AI breakthroughs with the award-winning Emma assistant and GenAI-powered EnrollPro decision support system.

Across each of these roles runs a common thread. Kirk builds and scales innovations that transform how industries work, creating technologies that move from experimental to essential at breathtaking speed.

“Jeff has been at the frontier of every major shift in digital innovation,” said Len Pagon, CEO of Robots & Pencils. “From shaping the future of eCommerce and mobile platforms at Brulant and Rosetta, to pioneering global voice AI at Amazon, to launching AI-driven customer experiences at bswift, Jeff has consistently delivered what’s next. He doesn’t just talk about AI. He builds products that millions use every day. With Jeff at the helm of Applied AI, Robots & Pencils is sharpening its challenger edge, helping clients leap ahead while legacy consultancies struggle to catch up. I’m energized by what this means for our clients and inspired by what it means for our people.”

Across two decades, Kirk has built a reputation for translating complex business requirements into enterprise-grade AI and technology solutions that scale, stick, and generate measurable results. His entrepreneurial mindset and hands-on leadership style uniquely position him to help clients experiment, activate, and operate AI across their businesses.

“Organizations and their workers are under pressure to innovate on behalf of customers while simultaneously learning to work with a new type of co-worker: artificial intelligence,” said Kirk. “The steps we take together to learn to work differently will lead to the most outsized innovation in our industries. I’m thrilled to join Robots & Pencils to push the boundaries of what’s possible with AI, to deliver outcomes that matter for our clients and their customers, and to create opportunities for our teams to do the most meaningful work of their careers.”

Kirk began his career at Brulant and Rosetta, where he worked alongside Pagon and other Robots & Pencils’ executive team members, leading engineering and solutions architecture across content, commerce, mobile, and social platforms. His return to the fold marks both a reunion and a reinvention, positioning Robots & Pencils as a leader in applied AI at scale.

The pace of AI change can feel relentless with tools, processes, and practices evolving almost weekly. We help organizations navigate this landscape with clarity, balancing experimentation with governance, and turning AI’s potential into practical, measurable outcomes. If you’re looking to explore how AI can work inside your organization—not just in theory, but in practice—we’d love to be a partner in that journey. Request an AI briefing.

September 8, 2025

The $150K PDF That Nobody Reads: From Research Deliverables to Living Systems

Tyler Klein

A product executive slides open her desk drawer. Tucked between old cables and outdated business cards is a thick, glossy report. The binding is pristine, the typography immaculate, the insights meticulously crafted. Six figures well spent, at least according to the invoice. Dust motes catch the light as she lifts it out: a monument to research that shaped… nothing, influenced… no one, and expired the day it was delivered.

It’s every researcher’s quiet fear. The initiative they poured months of work, a chunk of their sanity, and about a thousand sticky notes into becomes shelf-ware. Just another artifact joining strategy decks and persona posters that never found their way into real decisions.

This is the way research has been delivered for decades, by global consultancies, boutique agencies, and yes, even by me. At $150K a report, it sounds extravagant. But when you consider the sheer effort, the rarity of the talent involved, and the stakes of anchoring business decisions in real customer insight, it’s not hard to see why leaders sign the check.

The issue isn’t the value of the research. It’s the belief that insights should live in documents at all.

Research as a Living System

Now picture a different moment. The same executive doesn’t reach for a drawer. She opens her laptop and types: “What causes the most friction when ordering internationally?”

Within seconds she’s reviewing tagged quotes from dozens of interviews, seeing patterns of friction emerge, even testing new messaging against synthesized persona responses. The research isn’t locked in a PDF. It’s alive, queryable, and in motion.

This isn’t a fantasy. It’s the natural evolution of how research should work: not as one-time deliverables, but as a living system.

The numbers show why change is overdue. Eighty percent of Research Ops & UX professionals use some form of research repository, but over half reported fair or poor adoption. The tools are frustrating, time consuming to maintain, and lack ownership. Instead of mining the insights they already have, teams commission new studies, resulting in an expensive cycle of creating artifacts that sit idle, while decisions move on without them.

It’s a Usability Problem

Research hasn’t failed because of weak insights. It’s been constrained by the static format of reports. Once findings are bound in a PDF or slide deck, the deliverable has to serve multiple audiences at once, and it starts to bend under its own weight.

For executives, the executive summary provides a clean snapshot of findings. But when the time comes to make a concrete decision, the summary isn’t enough. They have to dive into the hundred-page appendix to trace back the evidence, which slows down the moment of action.

On the other hand, product teams don’t need summaries, they need detailed insights for the feature they’re building right now. In long static reports, those details are often buried or disconnected from their workflow. Sometimes they don’t even realize the answer exists at all, so the research goes unused, or even gets repeated. An insight that can’t be surfaced when it’s needed might as well not exist.

The constraint isn’t the quality of the research. It’s the format. Static deliverables fracture usability across audiences and leave each group working harder than they should to put insights into play.

Research as a Product

While we usually view research as an input into products, research itself is a product too. And with a product mindset, there is no “final deliverable,” only an evolving body of user knowledge that grows in value over time.

In this model, the researcher acts as a knowledge steward of the user insight “product,” curating, refining, and continuously delivering customer insights to their users: the executives, product managers, designers, and engineers who need insights in different forms and at different moments.

Like any product, research needs a roadmap. It has gaps to fill, like user groups not yet heard from, or behaviors not yet explored. It has features to maintain like transcripts, coded data, and tagged insights. And it has adoption goals, because insights only create value when people use them.

This approach transforms reports too. A static deck becomes just a temporary framing of the knowledge that already exists in the system. With AI, you can auto-generate the right “version” of research for the right audience, such as an executive summary for the C-suite, annotations on backlog items for product teams, or a user-centered evaluation for design reviews.

Treating research as a product also opens the door to continuous improvement. A research backlog can track unanswered questions, emerging themes, and opportunities for deeper exploration. Researchers can measure not just delivery (“did we produce quality insights?”) but usage (“did the insights influence a decision?”). Over time, the research “product” compounds in value, becoming a living, evolving system rather than a series of static outputs.

This new model requires a new generation of tools. AI can now cluster themes, surface patterns, simulate persona responses, and expose insights through natural Q&A. AI makes the recomposition of insights into deliverables cheap. That allows us to focus on how our users get the insights they need in the way they need them.

From Deliverable to Product

Treating research as a product changes the central question. It’s no longer, “What should this report contain?” but “What questions might stakeholders need to answer, and how do we make those answers immediately accessible?”

When research is built for inquiry, every transcript, survey, and usability session becomes part of a living knowledge base that compounds in value over time. Success shifts too: not in the number of reports delivered, but in how often insights are pulled into decisions. A six-figure investment should inform hundreds of critical choices, not one presentation that fades into archives.

And here’s the irony: the product mindset actually produces better reports as well. When purpose-built reports focus as much on their usage as the information they contain, they become invaluable components of the software production machine.

Research itself isn’t broken. It just needs a product mindset and AI-based qualitative analysis tools that turns insights into a living system, not a slide deck.

September 4, 2025

How Agentic AI Is Rewiring Higher Education

Kristina Gralak

A University Without a Nervous System

Walk through the back offices of most universities, and you will see the challenge. Admissions runs on one platform, advising on another, learning management on a third, and academic affairs on a fourth. Each system functions, yet little connects them. Students feel the gaps when financial aid processing is delayed, academic records are incomplete, and support processes remain confusing and slow. Leaders feel it in the cost of complexity and the weight of compliance.

Higher education institutions typically manage dozens of disconnected systems, with IT leaders facing persistent integration challenges that consume substantial staff time and budget resources while creating operational bottlenecks that affect both student services and institutional agility.

For decades, CIOs and CTOs have been tasked with stitching these systems together. Progress came in patches, with integrations here and dashboards there. What emerged looked more like scar tissue than connective tissue. Patchwork technology blocks digital transformation in higher education, and leaders now seek infrastructure that can unify rather than just connect.

The Rise of Agentic AI as Connective Tissue

Agentic AI wires the university together. Acting like a nervous system, it routes information and triggers actions throughout the institution, coordinating workflows through intelligent routing and contextual decision-making. Unlike traditional automation that follows rigid rules, agentic AI systems can make contextual decisions, learn from outcomes, and coordinate across multiple platforms without constant human oversight.

In practice, this means a transfer request automatically verifies transcripts through the National Student Clearinghouse, cross-references degree requirements in the SIS, flags discrepancies for staff to review, and updates student records, typically reducing processing time from 5-7 days to under 24 hours while maintaining accuracy. It means an advising system can recognize a retention risk, trigger outreach, and log the interaction without human staff piecing the puzzle together by hand.

Agentic AI needs a strong foundation. That foundation is cloud-native infrastructure for universities that’s built to scale during peak demand, enforce compliance, and keep every action visible. With this base in place, universities move from pilot projects to production systems. The result is infrastructure that holds under pressure and adapts when conditions change.

The Brain Still Decides

A nervous system does not think on its own. It carries signals to the brain, where decisions are made. In the university context the brain is still human, made up of faculty, advisors, administrators, and executives.

This is where the design philosophy matters. Agentic AI should amplify human capacity, not replace it. Advisors can spend more time in meaningful conversations with students because degree audits and schedule planning run on their own. CIOs can focus on strategic alignment because monitoring and audit logs are captured automatically. The architecture creates space for judgment, and it also creates space for human connection that strengthens the student experience.

However, this transition requires careful change management. Faculty often express concerns about AI decision-making transparency, while staff worry about job displacement. Successful implementations address these concerns through clear governance frameworks, explainable AI requirements, and retraining programs that position staff as AI supervisors rather than replacements.

What Happens When Signals Flow Freely

When agentic systems begin to carry the load, universities see a different rhythm. Transcript processing moves with speed. Advising interactions trigger at the right time. Students find support without friction. Leaders gain resilience as workflows carry themselves from start to finish. What emerges is more than efficiency. It is an institution that thinks and acts as one, with every part working in concert to support the student journey.

Designing for Resilience and Trust

CIOs and CTOs recognize that orchestration brings new responsibility. Data must be structured and governed, with student information requiring FERPA compliant handling throughout all automated processes. Agents must be observable and auditable. Compliance cannot live as a separate checklist but as a property of the system itself. AWS-native controls, from encryption to identity management, provide the levers to design with security as a default rather than a bolt-on.

At the same time, leaders must design for operational trust. A nervous system functions only when signals are reliable. This requires real-time monitoring dashboards, clear escalation protocols when agents encounter exceptions, and audit trails that document every automated decision.

The Next Chapter of Higher Education Infrastructure

What is happening now is less about another wave of apps and more about a shift in the foundation of the institution. Agentic AI is beginning to operate as infrastructure. It connects the university’s digital systems into something coordinated and adaptive.

The role of leadership is to decide how that nervous system will function, and what kind of human judgment it will amplify. Presidents, provosts, CIOs, and CTOs who recognize this shift will shape not only the student experience but the operational resilience of their institutions for years to come.

For leaders evaluating agentic AI initiatives, three factors determine readiness.

Data Maturity: Can you access clean, integrated data from core systems?

Process Standardization: Are workflows documented and consistent?

Cultural Openness: Do staff view technology as an enabler rather than a threat?

Institutions strong in all three areas see faster implementation and higher adoption rates.

The institutions that succeed will be those that view agentic AI not as a technology project, but as an organizational transformation requiring new governance models, staff capabilities, and student engagement strategies.

When the nervous system works, the signals move freely, and people do their best work. Students find support when they need it. Advisors focus on real conversations. Leaders see further ahead. That is the promise of agentic AI in higher education, not machines in charge, but machines carrying the load so people can do what only people can do.

Join Us

Join us at ASU’s Agentic AI and the Student Experience conference. Contact us to book time with our leaders and explore how agentic AI can strengthen your institution.

Request an AI Briefing.

September 1, 2025

Beyond Wrappers: What Protocols Leave Unsolved in AI Systems

Nilesh Patwardhan

I recently built a Model Context Protocol (MCP) integration for my Oura Ring. Not because I needed MCP, but because I wanted to test the hype: Could an AI agent make sense of my sleep and recovery data?

It worked. But halfway through I realized something. I could have just used the Oura REST API directly with a simple wrapper. What I ended up building was basically the same thing, just with extra ceremony.

As someone who has architected enterprise AI systems, I understand the appeal. Reliability isn’t optional, and protocols like MCP promise standardization. To be clear, MCP wasn’t designed to fix hallucinations or context drift. It’s a coordination protocol. But the experiment left me wondering: Are we solving the real problems or just adding layers?

The Wrapper Pattern That Won’t Go Away

MCP joins a long list of frameworks like LangChain, LangGraph, SmolAgents, and LlamaIndex, each offering a slightly different spin on coordination. But at heart, they’re all wrappers around the same issue, getting LLMs to use tools consistently.

Take CrewAI. On paper, it looked elegant with agents organized into “crews,” each with roles and tools. The demos showed frictionless orchestration. In practice? The agents ignored instructions, produced invalid JSON even after careful prompting, and burned days in debugging loops. When I dropped down to a lower-level tool like LangGraph, the problems vanished. CrewAI’s middleware hadn’t added resilience, it had hidden the bugs.

This isn’t an isolated frustration. Billions of dollars are flowing into frameworks while fundamentals like building reliable agentic systems remain unsettled. MCP risks following the same path. Standardizing communication may sound mature, but without solving hallucinations and context loss, it’s just more scaffolding on shaky foundations.

What We’re Not Solving

The industry has been busy launching integration frameworks, yet the harder challenges remain stubbornly in place:

Hallucinations and context loss – Models still produce unreliable outputs and lose track of key information. A protocol can’t fix that.

Tool selection errors – More tools mean more decisions. MCP makes it easier to connect them, but agents still reach for the wrong one. The integration problem multiplies.

Security in probabilistic systems – Security rules are deterministic, while model reasoning isn’t. Simon Willison showed how even a trivial prompt injection can trick an LLM into leaking sensitive data. That vulnerability isn’t a bug in MCP, it’s a feature of the whole paradigm.

As CData notes, these aren’t just implementation gaps. They’re fundamental challenges.

What the Experiments Actually Reveal

Working with MCP brought a sharper lesson. The difficulty isn’t about APIs or data formats. It’s about reliability and security.

When I connected my Oura data, I was effectively giving an AI agent access to intimate health information. MCP’s “standardization” amounted to JSON-RPC endpoints. That doesn’t address the deeper issue: How do you enforce “don’t share my health data” in a system that reasons probabilistically?

To be fair, there’s progress. Auth0 has rolled out authentication updates, and Anthropic has improved Claude’s function-calling reliability. But these are incremental fixes. They don’t resolve the architectural gap that protocols alone can’t bridge.

The Evidence Is Piling Up

The risks aren’t theoretical anymore. Security researchers keep uncovering cracks:

Simon Willison – Prompt injection leading to WhatsApp message exfiltration.

CyberArk – 13 vulnerabilities in MCP, from chaining exploits to admin bypasses.

Dark Reading – Hundreds of MCP servers exposed to remote code execution.

Asana – An MCP feature taken offline for two weeks after cross-organization data leaks.

Meanwhile, fragmentation accelerates. Merge.dev lists half a dozen MCP alternatives. Zilliz documents the “Great AI Agent Protocol Race.” Every new protocol claims to patch what the last one missed.

Why This Goes Deeper Than Protocol Wars

The adoption curve is steep. Academic analysis shows MCP servers grew from around 1,000 early this year to over 14,000 by mid-2025. With $50B+ in AI funding at stake, we’re not just tinkering with middleware; we’re building infrastructure on unsettled ground.

Protocols like MCP can be valuable scaffolding. Enterprises with many tools and models do need coordination layers. But the real breakthroughs come from facing harder questions head-on:

Identity management – Who really made this request: the user, the agent, or a shared system account?

Reliable tool selection – How do agents choose the right tool in a growing sea of options?

Security boundaries – How do you make “don’t share this data” a hard stop in a probabilistic reasoning system?

These problems exist no matter the protocol. And until they’re addressed, standardization risks becoming a distraction.

The question isn’t whether MCP is useful; it’s whether the focus on protocol standardization is proportional to the underlying challenges.

So Where Does That Leave Us?

There’s nothing wrong with building integration frameworks. They smooth edges and create shared patterns. But we should be honest about what they don’t solve.

For many use cases, native function calling or simple REST wrappers get the job done with less overhead. MCP helps in larger enterprise contexts. Yet the core challenges, reliability and security, remain active research problems.

That’s where the true opportunity lies. Not in racing to the next protocol, but in tackling the questions that sit at the heart of agentic systems.

Protocols are scaffolding. They’re not the main event.

Learn more about Agentic AI.

August 20, 2025

Stop Measuring AI Success by Lines of Code: The Real ROI is in the Boring Stuff

Jeremy Scherer

The headlines are hard to miss, “AI-powered code generation boosting developer velocity by 30%.” Lines of code written per hour skyrocketing. Teams shipping features faster than ever.

Yet the most significant returns aren’t showing up in those flashy metrics. The real ROI is emerging in places far less glamorous: the work that usually gets postponed, rushed, or quietly skipped.

The Quality Underground

While much attention is placed on code generation speed, something more consequential is happening behind the scenes. AI is proving most valuable when it tackles the tedious but essential work developers often deprioritize.

Test creation. Documentation updates. Boilerplate scaffolding. The quiet foundations of reliable software.

When testing becomes easier, teams actually do it. When documentation updates itself, it actually stays current. Organizations using AI-augmented testing report 50% lower costs and 60% faster test cycles¹. That’s more than efficiency. It’s a shift in quality assurance discipline.

A clear pattern is emerging: the less exciting the task, the greater the AI payoff.

The Multiplier Effect

This is where traditional measurements fall short. Counting lines of code tells us little about stability. Shipping features faster is less impressive if those features fail in production.

By contrast, metrics like test coverage and documentation completeness tell a different story. They reveal AI as a speed accelerator and a quality multiplier.

Some organizations are already seeing dramatic improvements, with test coverage climbing from 60% to 85%, documentation kept current for the first time in years, and edge cases automatically captured.

The takeaway is straightforward. AI makes developers quicker, and it makes the software they build more reliable.

The Tasks That Actually Matter

Consider the flow of software development. Writing business logic is often the easy part. The heavier lift comes in the margins: building robust test suites, maintaining documentation, handling edge cases thoroughly.

These are the tasks that are critical for quality, slow to complete, and frequently sacrificed under pressure. They are also the exact tasks where AI thrives.

Take test generation. Creating comprehensive tests often takes longer than the code itself, demanding developers think through failures and integration scenarios. AI can analyze code patterns, detect gaps, and generate tests that human teams might overlook. The result is not just faster coverage, but broader and more consistent coverage.

The Measurement Revolution

This shift creates an opening to rethink how AI success is measured. Instead of tracking raw velocity, organizations are following quality indicators:

Test coverage gains. Is automated test generation lifting overall coverage? Are edge cases caught that were previously missed?

Documentation freshness. Is technical documentation staying aligned with the code base? Are API docs reflecting real changes as they happen?

Defect reduction. Are fewer bugs reaching production? Is the feedback loop between developers and QA getting shorter?

These indicators surface AI’s true value: not simply producing more code but producing better software.

The Compound Returns

Quality improvements have a different kind of payoff: they compound.

Faster code generation saves time today. Stronger test coverage prevents costly failures tomorrow. Automated documentation will reduce onboarding time next quarter. Better quality controls fuel faster iteration next year.

Measured through this lens, AI’s impact becomes clearer. A 50% drop in production bugs delivers far greater financial benefit than a 50% increase in code generation speed.

The Quality Advantage

Teams focusing here are building something rare: systematic quality improvement woven into the development process itself.

Others may continue to compete on speed, but organizations that compete on reliability are building resilience. They’re lowering technical debt instead of accumulating it. They’re creating the conditions for sustainable experimentation.

Over time, that advantage compounds into a moat that’s hard to cross.

Reframing Success

When the next report touts impressive AI coding velocity, a different question is worth asking, “What is happening to quality?”

Because real AI transformation isn’t about developers typing faster. It’s about software that’s more dependable, because the unglamorous work is finally being done.

Organizations that see this are measuring the right outcomes. They’re finding that the “boring” tasks create the most durable advantages. Those are often the ones that matter most when customers decide whose product they trust.

Sources:

Unisys, ROI of Generative AI in Software Testing, 2024

August 11, 2025

Beyond Story Points: Rethinking Software Engineering Productivity in the Age of AI

Mark Phillips

Why traditional metrics fall short, and how modern frameworks like DORA and SPACE can guide better outcomes

For years, engineering leaders have relied on familiar metrics to gauge developer performance: story points, bug counts, and lines of code. These measures offered a shared baseline, especially in Agile environments where estimation and output needed a common language.

But in today’s AI-assisted world, those numbers no longer tell the full story. Performance isn’t just about volume or velocity. It’s about outcomes. Did the developer deliver the expected functionality, with the right quality, on time? That’s how we compensate today, and that’s still what matters. But how we measure those things must evolve.

With tools like GitHub Copilot, Claude Code, and Cursor generating entire functions, tests, and documentation quickly, output is becoming less about what a developer types and more about what they model, validate, and evolve.

The challenge for CIOs, CTOs, and SVPs of Engineering isn’t just adopting new tools. It’s rethinking how to measure effectiveness in a world where productivity is amplified by AI and complexity often hides behind automation.

Why Traditional Metrics Break Down

The future of measurement hinges on three categories: productivity, quality, and functionality. These have always been essential to evaluating engineering work. But in the AI era, we must measure them differently. That shift doesn’t mean abandoning objectivity; it means updating our tools.

The problem isn’t that legacy metrics are useless. It’s that they’re easily gamed, misinterpreted, or disconnected from business value.

Story points were never meant to be performance metrics. They were team-based estimates but are often misused to compare individuals.

Lines of code favor verbosity over impact. In an AI-powered world, writing less can mean doing more.

Velocity becomes misleading when AI inflates the number of completed tasks, even if they lack strategic value.

Quality, or the number of bugs found during testing, can reflect poor understanding of story points and not poor coding.

At best, these metrics create noise. At worst, they drive harmful incentives, like rewarding speed over safety, or activity over alignment.

Today’s AI-assisted workflows lack mature solutions for tracking whether functionality requirements, like EPICs and user stories, have been fully met. But new approaches, like multi-domain linking (MDL), are emerging to close that gap. Measurement is getting smarter, and more connected, because it has to.

The Rise of Directional Metrics

Modern frameworks like DORA and SPACE were built to address these gaps.

DORA (DevOps Research and Assessment) focuses on:

Deployment frequency

Lead time for changes

Change failure rate

Mean time to restore service

These measure delivery health, not just effort. They’re useful for understanding how efficiently and safely value reaches users.

SPACE (developed by Microsoft Research) considers:

Satisfaction and well-being

Performance

Activity

Communication and collaboration

Efficiency and flow

SPACE offers a more holistic view, especially in cross-functional and AI-assisted teams. It acknowledges that psychological safety, cross-team communication, and real flow states often impact long-term output more than individual commits.

AI Complicates the Picture

AI tools don’t eliminate the need for metrics; they demand smarter ones. When an LLM can write 80% of the code for a feature, how do we credit the developer? By the number of keystrokes? Or by their judgment in prompting, curating, and validating what the tool produced?

But here’s the deeper challenge: What if that feature doesn’t do what it was supposed to?

In AI-assisted workflows:

Code volume no longer maps to effort

Execution time can be reduced, but review and validation time increases

Errors shift from logic bugs to alignment gaps—where delivered functionality doesn’t match requirements

Productivity isn’t just about output; it’s about fitness to purpose. Without strong traceability between code, tests, user stories, and epics, it’s easy for teams to ship fast but fall short of the business goal.

Many organizations today struggle to answer a basic question: Did this delivery actually fulfill the intended functionality?

This is where multi-domain linking (MDL) and AI-powered traceability show promise. By connecting user stories, requirements, test cases, design artifacts, and even user feedback within a unified graph, teams can use LLMs to assess whether the output truly matches the input.

And this capability unlocks more than just better alignment, it opens the door to innovation. AI-assisted development enables organizations to build more complex, interconnected, and adaptive systems than ever before. As those capabilities expand, so too must our ability to measure their economic value. What applications can we now build that we couldn’t before? And what is that worth to the business?

That’s not a theoretical exercise. It’s the next frontier in engineering measurement.

Productivity as a System, Not a Score

The best engineering organizations treat productivity like instrumentation. No single number can tell you what’s working, but the right mix of signals can guide better decisions. That system must account for both delivery efficiency and functional alignment. High velocity is meaningless if the outcome doesn’t meet the requirements it was designed to fulfill.

That means:

Creating dashboards that show patterns, not just totals

Blending technical metrics (DORA) with team dynamics (SPACE)

Tracking improvements over time, not absolutes per sprint

Using metrics as coaching tools, not judgment tools

Most importantly, it means aligning measurement to what matters: Did the product deliver value? Did it meet its intended function? Was the effort worth the outcome? Those are the questions that still define success and the ones our measurement frameworks must help answer.

How to Start Rethinking Measurement

If your metrics haven’t evolved alongside your tooling, here’s how to get started:

Audit your current metric stack. What are you measuring, and what are you missing?

Align on outcomes. What does “good” look like for your business, not just your codebase?

Pick 2–3 directional metrics from DORA or SPACE that reflect your actual goals.

Baseline and benchmark. Don’t look for high scores; look for trends and improvement.

Build measurement into your retros. Turn metrics into prompts for discussion, not weapons for comparison.

AI is reshaping how software gets built. That doesn’t mean productivity can’t be measured. It means it must be measured differently. The leaders who shift from tracking motion to monitoring momentum will build faster, healthier, and more resilient engineering teams.

Robots & Pencils: Measuring What Matters in an AI-Driven World

At Robots & Pencils, we believe productivity isn’t a score; it’s a system. A system that must measure not just speed, but alignment. Did the output meet the requirements? Did it fulfill the epic? Was the intended functionality delivered?

We help clients extend traditional measurement approaches to fit an AI-first world. That means combining DORA and SPACE metrics with functional traceability, such as linking code to requirements, outcomes to epics, and user stories to business results.

Our secure, AWS-native platforms are already instrumented for this kind of visibility. And our teams are actively designing multi-domain models that give leaders better answers to the questions they care about most.

As AI opens the door to applications we never thought were possible, our job is to help you measure what matters, including what’s newly possible. We don’t just help teams move faster. We help them build with confidence and prove it.

August 11, 2025

Pilot, Protect, Produce: A CIO’s Guide to Adopting AI Code Tools

Tyler Klein

How to responsibly explore tools like GitHub Copilot, Claude Code, and Cursor—without compromising privacy, security, or developer trust

AI-assisted development isn’t a future state. It’s already here. Tools like GitHub Copilot, Claude Code, and Cursor are transforming how software gets built, accelerating boilerplate, surfacing better patterns, and enabling developers to focus on architecture and logic over syntax and scaffolding.

The productivity upside is real. But so are the risks.

For CIOs, CTOs, and senior engineering leaders, the challenge isn’t whether to adopt these tools—it’s how. Because without the right strategy, what starts as a quick productivity gain can turn into a long-term governance problem.

Here’s how to think about piloting, protecting, and operationalizing AI code tools so you move fast, without breaking what matters.

Why This Matters Now

In a recent survey of more than 1,000 developers, 81% of engineers reported using AI assistance in some form, and 49% reported using AI-powered coding assistants daily. Adoption is happening organically, often before leadership even signs off. The longer organizations wait to establish usage policies, the more likely they are to lose visibility and control.

On the other hand, overly restrictive mandates risk boxing teams into tools that may not deliver the best results and limit experimentation that could surface new ways of working.

This isn’t just a tooling decision. It’s a cultural inflection point.

Understand the Risk Landscape

Before you scale any AI-assisted development program, it’s essential to map the risks:

Data leakage: Code snippets may contain proprietary logic or PII. With some tools, there’s a risk that these are logged, transmitted, or even used in model training.

Telemetry and usage tracking: Many tools send back usage metadata, which could raise compliance or IP concerns in regulated environments.

Model transparency: Enterprise IT teams often have limited visibility into how third-party LLMs are trained or updated.

Token costs: High-volume usage of external LLMs like Anthropic’s Claude or OpenAI’s GPT-4 can drive significant costs if left unmonitored.

These aren’t reasons to avoid adoption. But they are reasons to move intentionally with the right boundaries in place.

Protect First: Establish Clear Guardrails

Protect First: Establish Clear Guardrails

A successful AI coding tool rollout begins with protection, not just productivity. As developers begin experimenting with tools like Copilot, Claude, and Cursor, organizations must ensure that underlying architectures and usage policies are built for scale, compliance, and security.

Consider:

Private repo isolation: Restrict tool access to non-sensitive codebases or open-source contributions during pilot phases.

In-house proxies or middle layers: Route prompt traffic through approved gateways that monitor or sanitize inputs.

Enterprise contracts over consumer logins: Ensure tools used by developers are under organizational agreements with clear data handling terms.

LLM containment strategies: For high-sensitivity environments, explore containerized models or fully managed options through secure platforms like Amazon Bedrock. Bedrock enables teams to use leading foundation models, including Anthropic’s Claude, within an enterprise-grade boundary, with no risk of model training leakage.

For teams ready to push further, Bedrock AgentCore offers a secure, modular foundation for building scalable agents with memory, identity, sandboxed execution, and full observability, all inside AWS. Combined with S3 Vector Storage, which brings native embedding storage and cost-effective context management, these tools unlock a secure pathway to more advanced agentic systems.

Most importantly, create an internal AI use policy tailored to software development. It should define tool approval workflows, prompt hygiene best practices, acceptable use policies, and escalation procedures when unexpected behavior occurs.

These aren’t just technical recommendations, they’re prerequisites for building trust and control into your AI adoption journey.

Pilot Intentionally

Start with champion teams who can balance experimentation with critical evaluation. Identify low-risk use cases that reflect a variety of workflows: bug fixes, test generation, internal tooling, and documentation.

Track results across three dimensions:

Developer experience: Does the tool actually help, or does it create new friction?

Code quality: Are generated suggestions valid, performant, and secure?

Team patterns: How do developers prompt? What guardrails do they naturally adopt or ignore?

Encourage developers to contribute usage insights and prompt examples. This creates the foundation for internal education and tooling norms.

Don’t Just Test—Teach

AI coding tools don’t replace development skills; they shift where those skills are applied. Prompt engineering, semantic intent, and architectural awareness become more valuable than line-by-line syntax.

That means education can’t stop with the pilot. To operationalize safely:

Embed coaching into code reviews (e.g., flagging unsafe prompt usage)

Create internal wikis or LLM-safe prompt libraries

Train tech leads on where generation helps and where it hurts

Build reusable workflows for common AI development scenarios

When used well, these tools amplify good developers. When used poorly, they obscure problems and inflate false productivity. Training is what makes the difference.

Produce with Confidence

Once you’ve piloted responsibly and educated your teams, you’re ready to operationalize with confidence. That means:

Defining tool selection criteria for different project types

Monitoring token usage and LLM cost impact

Establishing a feedback loop between engineering, IT, and security

Treating AI-assisted development as an evolving discipline—not a one-time rollout

Organizations that do this well won’t just accelerate development, they’ll build more resilient software teams. Teams that understand both what to build and how to orchestrate the right tools to do it. The best engineering leaders won’t mandate one AI tool or ban them altogether. They’ll create systems that empower teams to explore safely, evaluate critically, and build smarter together.

Robots & Pencils: Secure by Design, Built to Scale

At Robots & Pencils, we help enterprise engineering teams pilot AI-assisted development with the right mix of speed, structure, and security. Our preferred LLM, Anthropic, was chosen precisely because we prioritize data privacy, source integrity, and ethical model design; values we know matter to our clients as much as productivity gains.

We’ve been building secure, AWS-native solutions for over a decade, earning recognition as an AWS Partner with a Qualified Software distinction. That means we meet AWS’s highest standards for reliability, security, and operational excellence while helping clients adopt tools like Copilot, Claude Code, and Cursor safely and strategically.

We don’t just plug in AI; we help you govern it, contain it, and make it work in your world. From guardrails to guidance, we bring the technical and organizational design to ensure your AI tooling journey delivers impact without compromise.

August 11, 2025

The Changing Role of the Computer Programmer

Nicholas Waynik

How generative AI, cloud-native services, and intelligent orchestration are redefining the developer role and what it means for modern engineering teams

In the early days of computing, programmers were indispensable because they were the only ones who could speak the language of machines. From punch cards to assembly language, software development was hands-on and highly specialized. Even as languages evolved, from COBOL and C to Java and C#, one thing stayed constant: developers wrote every line themselves.

But that’s no longer true. And it hasn’t been for a while.

Today, enterprise developers have access to an entirely new class of tools: generative AI, intelligent agents, and secure, cloud-native building blocks that reduce the need to write, or even see, large amounts of code. This shift isn’t superficial. It’s redefining the nature of software development itself.

A recent Cornell University study reports that AI now generates at least 30% of Python code in major repositories in the U.S. And in enterprise environments at Google and Microsoft, 30–40% of new code is reported as AI-generated. That’s not a tweak in tooling. That’s a turning point in how software gets built.

From Code to Composition

For decades, the dominant paradigm in programming was one of writing: the developer’s job was to build logic from scratch, test it for accuracy, and ensure it could scale. As complexity grew, so did the stack of tools, including IDEs, frameworks, QA platforms, and versioning systems to support that work.

But in the last few years, the developer toolbox has changed dramatically. Tools like GitHub Copilot, Claude Code, and Cursor now generate reliable code in real time. Entire modules can be scaffolded with a few prompts. Meanwhile, cloud platforms like AWS offer modular services that handle everything from authentication to observability out of the box.

The result? Developers are shifting from authors to orchestrators. The value isn’t in how much code they can write; it’s in how well they can assemble, adapt, and govern systems that are increasingly AI-enabled, cloud-native, and composable.

Productivity and Quality are Improving, but are We Building the Right Thing?

AI-assisted development produces measurable gains. Code is being written faster. Boilerplate is disappearing. Bugs are easier to catch early. Even tests can be autogenerated. And yet, one challenge persists: verifying that the right thing is being built.

It’s relatively straightforward to measure productivity (lines of code, lead time) and quality (bug rates, test coverage). But ensuring correct functionality, such as matching what’s shipped to product requirements, user stories, and EPICs, is harder than ever. Code generation tools accelerate output, but they don’t always ensure alignment with intent.

That’s why the developer’s role is expanding. Understanding product vision, aligning technical architecture with business goals, and managing evolving requirements are becoming just as critical as technical skill.

What Should Engineering Leaders Expect from Modern Developers?

The pace of innovation in AI development tools is relentless. What a developer learns today may be outdated in a few months. This puts enormous pressure on engineering leaders to balance experimentation with sustainability.

The safest path forward? Anchor learning and experimentation within robust cloud ecosystems. AWS, for instance, offers stable development trajectories, strong security guardrails, and continuous improvements that minimize disruption. The goal isn’t to chase every new tool; it’s to build foundational fluency and adapt deliberately.

To succeed in this new environment, developers must think differently:

Less about how to write a function, more about which system can best provide it

Less about syntax, more about semantics

Less about build velocity, more about outcome orchestration

Code Isn’t Dead, but It’s Being Delegated

Let’s be clear: programming isn’t going away. But its role is evolving. The most impactful developers won’t be those who write the most lines of code, they’ll be the ones who know how to compose, configure, and coordinate intelligent systems with speed and confidence.

They’ll use prompts, ontologies, and models as naturally as they once used loops and conditionals. They’ll know when to generate, when to review, and when to intervene. And they’ll be deeply embedded in outcome-oriented thinking.

What Should Engineering Leaders Do Next?

As the role of the programmer changes, so too must the systems that support them. This means:

Reexamining how engineering productivity is measured

Establishing responsible adoption frameworks for AI code generation

Investing in training and team design that prioritizes orchestration, not just execution

The ground is shifting. But for organizations willing to embrace this change, the opportunity is enormous: faster iteration, stronger alignment, and more resilient systems—built by developers who think in outcomes, not just code.

Robots & Pencils: Redefining the Role, Rebuilding the Foundation

At Robots & Pencils, we’ve spent over a decade helping organizations adapt to shifts in software architecture and engineering practice. As developers move from coding line-by-line to orchestrating intelligent, cloud-native systems, our role is to help them and their leaders make that leap with confidence.

We design secure, cloud-native environments that empower developers to compose, not just code. With Anthropic as our preferred LLM and a track record of building modular, scalable solutions, we give teams the foundation they need to experiment responsibly, build faster, and deliver more value without compromising on security or quality.

For teams rethinking what it means to “write software,” we bring the expertise, architecture, and systems design to make the next role of the developer a strength, not a risk.

August 4, 2025

Patrick Higgins Named Chief Revenue Officer at Robots & Pencils

Robots & Pencils

From IBM transformation to AI-powered product strategy, Higgins brings proven enterprise expertise and client-first vision to fuel the firm’s next phase of commercial expansion

Robots & Pencils, an AI-first, global digital innovation firm specializing in cloud-native web, mobile, and app modernization, today announced the appointment of Patrick Higgins as Chief Revenue Officer (CRO). A seasoned technology leader with over 15 years of experience driving digital innovation for Fortune 500 companies, Higgins steps into a pivotal role to deepen client partnerships, scale impact, and fuel the company’s next phase of growth.

Higgins built his career at the intersection of digital product development, enterprise transformation, and applied AI. He began at IBM delivering mission-critical programs for large government and healthcare clients. Higgins then spent nearly a decade at WillowTree, where he helped scale the firm into a full-service digital agency and led go-to-market efforts across Media, Healthcare, and most recently, AI. Over the past year alone, he has advised more than 80 organizations on how to turn AI ambitions into action through strategic governance, rapid prototyping, and practical deployment strategies.

Now, as CRO at Robots & Pencils, Higgins will lead all commercial operations, with a focus on aligning strategy, sales, and client partnerships to help organizations unlock the full potential of AI, cloud-native architecture, and next-generation experiences.

“Patrick’s ability to listen deeply, build trust, and connect business goals to technical outcomes is exceptional,” said Leonard Pagon, CEO of Robots & Pencils. “He doesn’t just understand AI—he understands how to activate it inside the enterprise. He’s helped clients across industries turn emerging tech into scalable solutions, and his presence here marks a key step in our evolution. We’re not chasing growth for growth’s sake—we’re scaling the way we serve our clients. Patrick is the right leader to ensure that growth stays grounded in trust, results, and partnership.”

Higgins joins a growing executive team committed to challenging the traditional global systems integrators with a model that prioritizes speed, strategy, and elite delivery. With global centers of excellence and strategic partnerships with AWS, Salesforce, and Databricks, Robots & Pencils is positioned to help clients move beyond experimentation and into meaningful, AI-infused transformation.

“What drew me to Robots & Pencils is the caliber of the team and the clarity of the mission,” said Higgins. “We’re not just talking about AI. We’re delivering it—wrapped in thoughtful design, modern cloud infrastructure, and agile engineering. This is a firm built to move fast and deliver real results, and I’m honored to help lead the next chapter.”

In addition to his leadership role, Higgins is an active contributor to the AI community, serving as a panelist for the University of Virginia’s AI initiatives and helping organizations demystify their path to innovation. He holds a BA and MBA from the University of Virginia and lives in Charlottesville with his family.

August 4, 2025

Context Is King: How AWS & Anthropic Are Redefining AI Utility with MCP

Patrick Higgins

If AI is going to work at scale, it needs more than a model; it needs access, structure, and purpose.

At the AWS Summit in New York City, one phrase stuck with us:

“Models are only as good as the context they’re given.”

It came during an insightful joint session from AWS and Anthropic on Model Context Protocol (MCP), a deceptively simple concept with massive implications.

Across this recap series, we’ve explored the rise of agentic AI, the infrastructure required to support it, and the ecosystem AWS is building to accelerate adoption. MCP is the connective tissue that brings it all together. It’s how you move from smart models to useful systems.

Why Context Is the New Bottleneck

Generative AI has been evolving fast, but enterprise implementation is still slow. Why?

Because no matter how advanced your model is, it can’t help you make better decisions if it’s not connected to what makes your business unique: Your data. Your tools. Your systems. Your users.

That’s where MCP comes in.

What Is MCP—and Why It Matters

Model Context Protocol (MCP) is a specification that allows AI models to dynamically discover and interact with third-party tools, data sources, and instructions. Think of it as a structured interface—where systems publish a list of tools, what they do, the inputs they require, and how the model should use them.

For executives, that means your AI agents can tap into real business logic—not by guessing, but by calling documented resources your teams control. For engineers, it means you can expose functions, services, or datasets via an MCP server, enabling LLMs to perform meaningful actions without hardcoding every step.

The result? AI that doesn’t just respond—it executes, using tools it finds and understands in real time.

With MCP, you can:

Connect any model to any MCP-enabled system: Pull data from structured and unstructured sources, trigger tools, and interact with APIs—on demand.

Assign agent permissions and scopes: Let different agents access different tools based on business roles or workflows.

Enable intelligent task execution: Run lightweight services (“microtools”) directly from the MCP environment—no handoffs required.

Handle complex use cases: With built-in function calling, system prompting, sampling, registry, elicitation, and discovery, MCP lets agents ask questions, call other models, and complete multi-step tasks without breaking context.

In short: MCP allows generative AI to break free of the chat window and take real-world action.

Real Integration, Not Just Model Tuning

With MCP servers already available in AWS, your teams can start building agentic AI products that can utilize your unique business logic, customer data, and internal systems. This isn’t hypothetical. It’s real and ready to deploy today.

At Robots & Pencils, we’re already using this pattern with our clients:

Exposing legacy data through modern interfaces

Building modular tools for agents to call dynamically

Designing UX around co-pilots that ask users the right questions at the right moment

We call this approach Emergent Experience Design, a framework for building systems where agents adapt, interfaces evolve, and outcomes unfold through interaction. If you’re rethinking UX in the age of AI, this is where to start.

And when you combine this with what we covered in The Future Is Agentic, Modernization Reloaded, and From AI to Execution, you start to see the bigger picture: Agentic AI isn’t just a new model. It’s a new way of working. And context is the infrastructure it runs on.

Plug AI into the Business, Not Just the Cloud

The hype phase of generative AI is behind us. What matters now is how well your systems can support intelligent action. If you want AI that drives real outcomes, you don’t just need better models. You need better context. That’s the promise of MCP—and the opportunity ahead for organizations ready to take the next step.

If you’re experimenting with GenAI and want to connect it to your real-world data and systems, we should talk.

Jeff Kirk Named Executive Vice President of Applied AI at Robots & Pencils

Related articles

Patrick Higgins Named Chief Revenue Officer at Robots & Pencils

Nathan Carmon Named Chief Operating Officer of Robots & Pencils

Robots & Pencils Earns AWS Qualified Software Distinction as an AWS Partner

The $150K PDF That Nobody Reads: From Research Deliverables to Living Systems

Related articles