Tyler Klein, Author at Robots & Pencils

The Agentic Trap: Why 40% of AI Automation Projects Lose Momentum

Tyler Klein — Mon, 24 Nov 2025 15:27:30 +0000

Gartner’s latest forecast is striking: more than 40% of agentic AI projects will be canceled by 2027. At first glance, this looks like a technology growing faster than it can mature. But a closer look across the industry shows a different pattern. Many initiatives stall for the same reason micromanaged teams do. The work is described at the level of steps rather than outcomes. When expectations aren’t clear, people wait for instructions. When expectations aren’t clear for agents, they either improvise poorly or fail to act.

This is the same shift I described in my previous article, “Software’s Biggest Breakthrough Was Making It Cheap Enough to Waste.” When software becomes inexpensive enough to test freely, the organizations that pull ahead are the ones that work toward clear outcomes and validate their decisions quickly.

Agentic AI is the next stage of that evolution. Autonomy becomes meaningful only when the organization already understands the outcome it’s trying to achieve, how good decisions support that outcome, and when judgment should shift back to a human.

The Shift to Outcome-Oriented Programming

Agentic AI brings a model that feels intuitive but represents a quiet transformation. Traditional automation has always been procedural in that teams document the steps, configure the workflow, and optimize the sequence. Like a highly scripted form of people management, this model is effective when the work is predictable, but limited when decisions are open-ended or require problem solving.

Agentic systems operate more like empowered teams. They begin with a desired outcome and use planning, reasoning, and available tools to move toward it. As system designers, our role shifts from specifying every step to defining the outcome, the boundaries, and the signals that guide good judgment.

Instead of detailing each action, teams clarify:

What the outcome should be

How success will be measured

Which contextual signals matter

Where the boundaries and escalation points are

This shift places new demands on organizational clarity. To support outcome-oriented systems, teams need a shared understanding of how decisions are made. They need to determine what good judgment looks like, what tradeoffs are acceptable, and how to recognize situations that require human involvement.

Industry research points to the same conclusion. Harvard Business Review notes that teams struggle when they choose agentic use cases without first defining how those decisions should be evaluated. XMPRO shows that many failures stem from treating agentic systems as extensions of existing automation rather than as tools that require a different architectural foundation. RAND’s analysis adds that projects built on assumptions instead of validated decision patterns rarely make it into stable production.

Together, these findings underscore a simple theme. Agents thrive when the organization already understands how good decisions are made.

Decision Intelligence Shapes Agentic Performance

Agentic systems perform well when the outcome is clear, the signals are reliable, and proper judgment is well understood. When goals or success criteria are fuzzy, or tasks overly complex, performance mirrors that ambiguity.

In a Carnegie Mellon evaluation, advanced models completed merely one-third of multi-step tasks without intervention. Meanwhile, First Page Sage’s 2025 survey showed much higher completion rates in more structured domains, with performance dropping as tasks became more ambiguous or context heavy.

This reflects another truth about autonomy. Some problems are simply too broad or too abstract for an agent to manage directly. In such cases, the outcome must be broken into sub-outcomes, and those into smaller decisions, until the individual pieces fall within the system’s ability to reason effectively.

In many ways, this mirrors effective leadership. Good leaders don’t hand individual team members a giant, unstructured mandate. They cascade outcomes into stratified responsibilities that people can act on. Agentic systems operate the same way. They thrive when the goal has been decomposed into solvable parts with well-defined judgment and guardrails.

This is why organizational clarity becomes a core predictor of success.

How Teams Fall Into the Agentic Trap

Many organizations feel the pull of agentic AI because it promises systems that plan, act, and adapt without waiting for human intervention. But the projects that stall often fall into a predictable trap.

Teams begin by automating process instead of automating the judgment behind the decisions the agent is expected to make. Teams define what a system should do instead of defining how to evaluate the output or what “good” should look like. Vague quality metrics, progress signals, and escalation criteria lead to technically valid, strategically mediocre decisions that erode confidence in the system.

The research behind this pattern is remarkably consistent. HBR notes that teams often choose agentic use cases before they understand the criteria needed to evaluate them. XMPRO describes the architectural breakdowns that occur when agentic systems are treated like upgrades to procedural automation. RAND’s analysis shows that assumption-driven decision-making is one of the strongest predictors of AI project failure, while projects built on clear evaluation criteria and validated decision patterns are far more likely to reach stable production.

This is the agentic trap: trying to automate judgment without first understanding how good judgment is made. Agentic AI is more than automation of steps, it’s the automation of evaluation, prioritization, and tradeoff decisions. Without clear outcomes, criteria, signals, and boundaries to inform decision-making, the system has nothing stable to scale, and its behavior reflects that uncertainty.

A Practical Way Forward: The Automation Readiness Assessment
Decisions that succeed under autonomy share five characteristics. When one or more are missing, agents need more support:

Decision Understanding: Teams document how good decisions are made: not just the steps, but the criteria, signals, and judgment patterns. If a new teammate could reproduce the decision with consistency, the foundation is strong.

Validated Patterns: The decision has been tested repeatedly with consistent, measurable results. Variance is understood. Edge cases surface early.

Success Metrics: Clear thresholds define what “good” looks like, what counts as acceptable variance, and when escalation should occur.

Data Signals: All required information is available, trustworthy, and accessible from a unified interface. Decisions are only as good as the signals behind them.

Governance Boundaries: Teams define what the agent may and may not do, when it must escalate, and where human oversight remains essential.

Have all five? Build with confidence.
Only three or four? Pilot with human review in order to build up a live data set.
Only one or two? Go strengthen your decision clarity before automating.

This approach keeps teams grounded. It turns autonomy from an aspirational leap into a disciplined extension of what already works.

The Path to Agentic Maturity

Agentic AI expands an organization’s capacity for coordinated action, but only when the decisions behind the work are already well understood. The projects that avoid the 40% failure curve do so because they encode judgement into agents, not just process. They clarify the outcome, validate the decision pattern, define the boundaries, and then let the system scale what works.

Clarity of judgment produces resilience, resilience enables autonomy, and autonomy creates leverage. The path to agentic maturity begins with well-defined decisions. Everything else grows from there.

The pace of AI change can feel relentless with tools, processes, and practices evolving almost weekly. We help organizations navigate this landscape with clarity, balancing experimentation with governance, and turning AI’s potential into practical, measurable outcomes. If you’re looking to explore how AI can work inside your organization—not just in theory, but in practice—we’d love to be a partner in that journey. Request an AI briefing.

Key Takeaways

Agentic AI only creates leverage when decisions are already well understood. The strongest projects start from clearly defined outcomes, success metrics, and decision criteria, then give agents room to act within those boundaries.

Outcome-oriented programming replaces step-by-step scripting. Traditional automation focuses on sequences of tasks. Agentic systems focus on the result, the signals that guide judgment, and the escalation paths that keep risk controlled.

Organizational clarity is the real performance bottleneck. Agentic systems mirror the quality of the environment around them. Clear outcomes, validated decision patterns, and reliable data signals translate directly into more effective autonomy.

Many failed projects share one root cause: unarticulated decisions. Initiatives lose momentum when teams automate decisions that have never been documented, measured, or tested, so value becomes hard to demonstrate and risk becomes hard to govern.

The Automation Readiness Assessment turns autonomy into a staged progression. By evaluating five factors, teams can decide whether to build, pilot with human review, or first strengthen decision clarity before pushing for autonomy.

Agentic maturity follows a sequence. Clarify outcomes, validate patterns, define governance boundaries, and then scale what works. Clarity of judgment produces resilience, resilience enables autonomy, and autonomy amplifies impact.

FAQs

What is the “agentic trap”?
The agentic trap describes what happens when organizations rush to deploy agents that plan and act, before they have defined the outcomes, decision criteria, and guardrails those agents require. The technology looks powerful, yet projects stall because the underlying decisions were never made explicit.

How is agentic AI different from traditional automation?
Traditional automation follows a procedural model. Teams document a sequence of steps and the system executes those steps in predictable conditions. Agentic AI starts from an outcome, uses planning and reasoning to choose actions, and navigates toward that outcome using tools, data, and judgment signals. The organization moves from “here are the steps” to “here is the result, the boundaries, and the signals that matter.”

Why do so many agentic AI projects lose momentum?
Momentum fades when teams try to automate decisions that have not been documented, validated, or measured. Costs rise, risk concerns surface, and it becomes harder to show progress against business outcomes. Research from Gartner, Harvard Business Review, XMPRO, and RAND all point to the same pattern: projects thrive when the decision environment is explicit and validated, and they struggle when it is based on assumptions.

What makes a decision “ready” for autonomy?
Decisions are ready for agentic automation when they meet five criteria:

Decision Understanding: Teams can describe how good decisions are made, including criteria and judgment patterns.

Validated Pattern: The decision has been tested repeatedly with consistent results and known variance.

Success Metrics: Clear thresholds define acceptable outcomes and escalation conditions.

Data Signals: Required information is reliable, available, and accessible from a unified interface.

Governance Boundaries: The system has clear permissioning, escalation rules, and human oversight points.

The more of these elements are present, the more confidently teams can extend autonomy.

How can we use the Automation Readiness Assessment in practice?
Use the five criteria as a simple scoring lens for each candidate decision:

All five present: advance to build and scale.

Three or four present: run a pilot with human review to gather live data and refine the pattern.

One or two present: invest in clarifying and testing the decision before automation.

This keeps investment aligned with decision maturity and creates a clear path from experimentation to durable production.

Where should leaders focus first to reach agentic maturity?
Leaders gain the most leverage by focusing on judgment clarity within critical workflows. That means aligning on desired outcomes, success metrics, escalation thresholds, and the signals that inform good decisions. With that foundation, agentic AI becomes a force multiplier for well-understood work rather than a risky experiment in ambiguous territory.

The post The Agentic Trap: Why 40% of AI Automation Projects Lose Momentum appeared first on Robots & Pencils.

Software’s Biggest Breakthrough Was Making It Cheap Enough to Waste

Tyler Klein — Tue, 18 Nov 2025 13:30:25 +0000

AI and automation are making development quick and affordable. Now, the future belongs to teams that learn as fast as they build.

Building software takes patience and persistence. Projects run long, budgets stretch thin, and crossing the finish line often feels like survival. If we launch something that works, we call it a win.

That rhythm has defined the industry for decades. But now, the tempo is changing. Kevin Kelly, the founding executive editor of Wired Magazine, once said, “Great technological innovations happen when something that used to be expensive becomes cheap enough to waste.”

AI-assisted coding and automation are eliminating the bottlenecks of software development. What once took months or years can now be delivered in days or weeks. Building is no longer the hard part. It’s faster, cheaper, and more accessible than ever.

Now, as more organizations can build at scale, custom software becomes easier to replicate, and its ROI as a competitive advantage grows less predictable. As product differentiation becomes more difficult to maintain, a new source of value emerges: applied learning, how effectively teams can build, test, adapt, and prove what works.

This new ROI is not predicted. It depends on the ability to:

Build faster to test ideas in the real world.

Learn faster from data, feedback, and outcomes.

Adapt faster to turn proven insights into scalable solutions.

The organizations that succeed will learn faster from what they build and build faster from what they learn.

From Features to Outcomes, Speculation to Evidence

Agile transformed how teams build software. It replaced long project plans with rapid sprints, continuous delivery, and an obsession with velocity. For years, we measured progress by how many features we shipped and how fast we shipped them.

But shipping features doesn’t equal creating value. A feature only matters if it changes behavior or improves an outcome, and many don’t. As building gets easier, the hard part shifts to understanding which ideas truly create impact and why.

AI-assisted and automated development now make that learning practical. Teams can generate several variations of an idea, test them quickly, and keep only what works best. The work of software development starts to look more like controlled experimentation.

This changes how we measure success. The old ROI models relied on speculative forecasts and business cases built on assumptions about value, timelines, and adoption. We planned, built, and launched, but when the product finally reached users, both the market and the problem had already evolved.

Now, ROI becomes something we earn through proof. We begin with a measurable hypothesis and build just enough to test it:

If onboarding time falls by 30 percent, retention will rise by 10 percent,
creating two million dollars in annual value.

Each iteration provides evidence. Every proof point increases confidence and directs the next investment. In this way, value creation and validation merge, and the more effectively we learn, the faster our return compounds.

ROI That Compounds

ROI used to appear only after launch, when the project was declared “done.” It was calculated as an academic validation of past assumptions and decisions. The investment itself remained a sunk cost, viewed as money spent months ago.

In an outcome-driven model, value begins earlier and grows with every iteration. Each experiment creates two returns: the immediate impact of what works and the insight gained from what doesn’t. Both make the next round more effective.

Say you launched a small pilot with ten users. Within weeks, they’re saving time, finding shortcuts, and surfacing friction you couldn’t predict on paper. That feedback shapes the next version and builds the confidence to expand to a hundred users. Now, you can measure quantitative impact, like faster response times, fewer manual steps, and higher satisfaction. Pay off rapidly scales, as the value curve steepens with each round of improvement.

Moreover, you are collecting measurement on return continuously, using each cycle’s results as evidence to justify the next. In this way, return becomes the trigger for further investment, and the faster the team learns, the faster the return accelerates.

Each step also leaves behind a growing library of reusable assets: validated designs, cleaner data, modular components, and refined decision logic. Together, these assets make the organization smarter and more efficient with each cycle.

When learning and value grow together, ROI becomes a flywheel. Each iteration delivers a product that’s smarter, a team that’s sharper, and an organization more confident in where to invest next. To harness that momentum, we need reliable ways to measure progress and prove that value is growing with every step.

Measuring Progress in an Outcome-Driven Model

When ROI shifts from prediction to evidence, the way we measure progress has to change. Traditional business cases rely on financial projections meant to prove that an investment would pay off. In an outcome-driven model, those forecasts give way to leading indicators collected in real-time.

Instead of measuring progress by deliverables and deadlines, we use signals that show we’re moving in the right direction. Each iteration increases confidence that we are solving the right problem, delivering the right outcome, and generating measurable value.

That evidence evolves naturally with the product’s maturity. Early on, we look for behavioral signals, or proof that users see the problem and are willing to change. As traction builds, we measure whether those new behaviors produce the desired outcomes. Once adoption scales, we track how effectively the system converts those outcomes into sustained business value.

You can think of it as a chain of evidence that progresses from leading to lagging indicators:

Behavioral Change → Outcome Effect → Monetary Impact

The challenge, then, is to create a methodology that exposes these signals quickly and enables teams to move through this progression with confidence, learning as they go. This process conceptually follows agile, but changes as the product evolves through four stages of maturity:

Explore & Prototype → Pilot & Validate → Scale & Optimize → Operate & Monitor

At each stage, teams iteratively build, test, and learn, advancing only when success is proven. What gets built, how it’s measured, and what “success” means evolve as the product matures. Early stages emphasize exploration and learning; later stages focus on optimizing outcomes and capturing value. Each transition strengthens both evidence that the product works and confidence in where to invest next.

1. Explore & Prototype:

In the earliest stage, the goal is to prove potential. Teams explore the problem space, test assumptions, and build quick prototypes to expose what’s worth solving. The success measures are behavioral: evidence of user willingness and intent. Do users engage with early concepts, sign up for pilots, or express frustration with the current process? These signals de-risk demand and validate that the problem matters.

The product moves to the next stage only with a clear, quantified problem statement supported by credible behavioral evidence. When users demonstrate they’re ready for change, the concept is ready for validation.

2. Pilot & Validate:

Here’s where a prototype turns into a pilot to test whether the proposed solution actually works. Real users perform real tasks in limited settings. The indicators are outcome-based. Can people complete tasks faster, make fewer errors, or reach better results? Each of these metrics ties directly to the intended outcome that the product aims to achieve.

To advance from this stage, the pilot must show measurable progress towards the outcome. When that evidence appears, it’s time to expand.

3. Scale & Optimize:

As adoption grows, the focus shifts from proving the concept to demonstrating outcomes and refining performance. Every new user interaction generates evidence that helps teams understand how the product creates impact and where it can improve.

Learning opportunities emerge from volume. Broader usage reveals edge cases, hidden friction points, and variations that allow teams to refine the experience, calibrate models, automate repetitive tasks, and strengthen outcome efficacy.

At this stage, value indicators connect usage to business KPIs like faster response times, higher throughput, improved satisfaction, and lower support costs. This is where value capture compounds. As more users adopt the product, the value they generate accumulates, proving that the system delivers significant business impact.

The product reaches the next level of maturity when it shows sustained reliable impact to outcome measures across wide-spread usage.

4. Operate & Monitor:

In the final stage, the emphasis shifts from optimization to observation. The system is stable, but the environment and user needs continue to evolve and erode effectiveness over time. The goal is twofold: ensure that value continues to be realized and detect the earliest signals of change.

The indicators now focus on sustained ROI and performance integrity. Teams track metrics that show ongoing return (cost savings, revenue contribution, efficiency gains) while monitoring usage patterns, engagement levels, and model accuracy.

When anomalies appear (drift in outcomes, declining engagement, or new behaviors), they become the warning signs of changing user needs. Each anomaly hints at a new opportunity and loops the team back into exploration. This begins the next cycle of innovation and validation.

From Lifecycle to Flywheel: How ROI Becomes Continuous

Across these stages, ROI becomes a continuous cycle of evidence that matures alongside the product itself. Each phase builds on the one before it.

Explore & Prototype creates early confidence that the problem is worth solving.

Pilot & Validate proves that the solution works.

Scale & Optimize demonstrates measurable outcomes while capturing real business value.

Operate & Monitor sustains that value capture and reveals where the next cycle begins.

Together, these stages form a closed feedback loop—or flywheel—where evidence guides investment. Every dollar spent produces both impact and insight, and those insights direct the next wave of value creation. The ROI conversation shifts from “Do you believe it will pay off?” to “What proof have we gathered, and what will we test next?”

From ROI to Investment Upon Return

AI and automation have made building easier than ever before. The effort that once defined software development is no longer the bottleneck. What matters now is how quickly we can learn, adapt, and prove that what we build truly works.

In this new environment, ROI becomes a feedback mechanism. Returns are created early, validated often, and reinvested continuously. Each cycle of discovery, testing, and improvement compounds both value and understanding, and creates a lasting continuous advantage.

This requires a mindset shift as much as a process shift. From funding projects based on speculative confidence in a solution, to funding them based on their ability to generate proof. When return on investment becomes investment upon return, the economics of software change completely. Value and insight grow together. Risk declines with every iteration.

When building becomes easy. Learning fast creates the competitive advantage.

The New Equations

Predictive ROI → Evidential ROI

Features as Value → Outcomes as Value

Delivery Success → Learning Success

Fixed Scope → Scaled Confidence

Return on Investment → Return on Insight

Key Takeaways

AI-assisted development has made building software fast, affordable, and repeatable, shifting the value equation toward validation and learning.

Evidential ROI replaces predictive ROI, using proof over projection to guide investment and strategy.

Iterative learning turns every sprint into calibration, where teams advance by testing, validating, and refining in real time.

Return on Learning measures how fast teams adapt and evolve, while Return on Ecosystem tracks how insights spread across an organization.

The new competitive advantage lies in learning speed, not build speed. Those who learn faster deliver greater long-term value.

FAQs

What does “software cheap enough to waste” mean?
It describes a new phase in software development where AI and automation have made building fast, low-cost, and low risk, allowing teams to experiment more freely and learn faster.

Why does cheaper software matter for innovation?
When building is inexpensive, experimentation becomes affordable. Teams can test more ideas, learn from data, and refine products that actually work for people.

How does this change ROI in software development?
Traditional ROI measured delivery and cost efficiency. Evidential ROI measures learning, outcomes, and validated impact, value that grows with each iteration.

What are Return on Learning and Return on Ecosystem?
Return on Learning measures how quickly teams adapt and improve through cycles of experimentation. Return on Ecosystem measures how insights spread and create shared success across teams.

What’s the main takeaway for leaders?
AI and automation have changed the rules. The winners will be those who learn the fastest, not those who build the most.

The post Software’s Biggest Breakthrough Was Making It Cheap Enough to Waste appeared first on Robots & Pencils.

The $150K PDF That Nobody Reads: From Research Deliverables to Living Systems

Tyler Klein — Mon, 08 Sep 2025 19:06:25 +0000

A product executive slides open her desk drawer. Tucked between old cables and outdated business cards is a thick, glossy report. The binding is pristine, the typography immaculate, the insights meticulously crafted. Six figures well spent, at least according to the invoice. Dust motes catch the light as she lifts it out: a monument to research that shaped… nothing, influenced… no one, and expired the day it was delivered.

It’s every researcher’s quiet fear. The initiative they poured months of work, a chunk of their sanity, and about a thousand sticky notes into becomes shelf-ware. Just another artifact joining strategy decks and persona posters that never found their way into real decisions.

This is the way research has been delivered for decades, by global consultancies, boutique agencies, and yes, even by me. At $150K a report, it sounds extravagant. But when you consider the sheer effort, the rarity of the talent involved, and the stakes of anchoring business decisions in real customer insight, it’s not hard to see why leaders sign the check.

The issue isn’t the value of the research. It’s the belief that insights should live in documents at all.

Research as a Living System

Now picture a different moment. The same executive doesn’t reach for a drawer. She opens her laptop and types: “What causes the most friction when ordering internationally?”

Within seconds she’s reviewing tagged quotes from dozens of interviews, seeing patterns of friction emerge, even testing new messaging against synthesized persona responses. The research isn’t locked in a PDF. It’s alive, queryable, and in motion.

This isn’t a fantasy. It’s the natural evolution of how research should work: not as one-time deliverables, but as a living system.

The numbers show why change is overdue. Eighty percent of Research Ops & UX professionals use some form of research repository, but over half reported fair or poor adoption. The tools are frustrating, time consuming to maintain, and lack ownership. Instead of mining the insights they already have, teams commission new studies, resulting in an expensive cycle of creating artifacts that sit idle, while decisions move on without them.

It’s a Usability Problem

Research hasn’t failed because of weak insights. It’s been constrained by the static format of reports. Once findings are bound in a PDF or slide deck, the deliverable has to serve multiple audiences at once, and it starts to bend under its own weight.

For executives, the executive summary provides a clean snapshot of findings. But when the time comes to make a concrete decision, the summary isn’t enough. They have to dive into the hundred-page appendix to trace back the evidence, which slows down the moment of action.

On the other hand, product teams don’t need summaries, they need detailed insights for the feature they’re building right now. In long static reports, those details are often buried or disconnected from their workflow. Sometimes they don’t even realize the answer exists at all, so the research goes unused, or even gets repeated. An insight that can’t be surfaced when it’s needed might as well not exist.

The constraint isn’t the quality of the research. It’s the format. Static deliverables fracture usability across audiences and leave each group working harder than they should to put insights into play.

Research as a Product

While we usually view research as an input into products, research itself is a product too. And with a product mindset, there is no “final deliverable,” only an evolving body of user knowledge that grows in value over time.

In this model, the researcher acts as a knowledge steward of the user insight “product,” curating, refining, and continuously delivering customer insights to their users: the executives, product managers, designers, and engineers who need insights in different forms and at different moments.

Like any product, research needs a roadmap. It has gaps to fill, like user groups not yet heard from, or behaviors not yet explored. It has features to maintain like transcripts, coded data, and tagged insights. And it has adoption goals, because insights only create value when people use them.

This approach transforms reports too. A static deck becomes just a temporary framing of the knowledge that already exists in the system. With AI, you can auto-generate the right “version” of research for the right audience, such as an executive summary for the C-suite, annotations on backlog items for product teams, or a user-centered evaluation for design reviews.

Treating research as a product also opens the door to continuous improvement. A research backlog can track unanswered questions, emerging themes, and opportunities for deeper exploration. Researchers can measure not just delivery (“did we produce quality insights?”) but usage (“did the insights influence a decision?”). Over time, the research “product” compounds in value, becoming a living, evolving system rather than a series of static outputs.

This new model requires a new generation of tools. AI can now cluster themes, surface patterns, simulate persona responses, and expose insights through natural Q&A. AI makes the recomposition of insights into deliverables cheap. That allows us to focus on how our users get the insights they need in the way they need them.

From Deliverable to Product

Treating research as a product changes the central question. It’s no longer, “What should this report contain?” but “What questions might stakeholders need to answer, and how do we make those answers immediately accessible?”

When research is built for inquiry, every transcript, survey, and usability session becomes part of a living knowledge base that compounds in value over time. Success shifts too: not in the number of reports delivered, but in how often insights are pulled into decisions. A six-figure investment should inform hundreds of critical choices, not one presentation that fades into archives.

And here’s the irony: the product mindset actually produces better reports as well. When purpose-built reports focus as much on their usage as the information they contain, they become invaluable components of the software production machine.

Research itself isn’t broken. It just needs a product mindset and AI-based qualitative analysis tools that turns insights into a living system, not a slide deck.

Next in the series, we look at two more shifts: AI removing the depth vs. breadth constraint, and the rise of agents as research participants.

Key Takeaways

Traditional research deliverables, like lengthy reports and slide decks, often expire the moment they are delivered, leaving insights unused.
The problem is not weak research but static formats that fracture usability across executives, product teams, and designers.
Treating research as a product reframes it as a living system: evolving, queryable, and compounding in value over time.
With a product mindset, researchers become knowledge stewards, curating and delivering insights in forms tailored to each audience.
AI enables this shift by clustering themes, surfacing patterns, and recomposing deliverables dynamically, making insights immediately accessible.

FAQs

What is the problem with traditional research reports?
Traditional reports often serve as static artifacts. Once published, they struggle to meet the needs of multiple audiences and quickly become outdated, limiting their impact on real decisions.

Why is research often underutilized in organizations?
Research is underutilized because its insights are locked in formats like PDFs or decks. Executives, product teams, and designers often cannot access the right detail at the right time, so findings go unused or studies are repeated.

What does it mean to treat research as a product?
Treating research as a product means building a continuously evolving knowledge base rather than one-time deliverables. Insights are curated, updated, and delivered in forms that align with the needs of different stakeholders.

How does AI support this new model?
AI makes it possible to cluster themes, surface weak signals, and generate audience-specific deliverables on demand. This reduces maintenance overhead and ensures insights are always accessible when needed.

What role do researchers play in this model?
Researchers become knowledge stewards, ensuring the insight “product” is accurate, relevant, and continuously improved. Their work shifts from producing final reports to curating and delivering insights that compound in value over time.

How does this benefit organizations?
Organizations gain faster, more confident decision-making. A six-figure research investment can inform hundreds of decisions, rather than fading after a single presentation.

The post The $150K PDF That Nobody Reads: From Research Deliverables to Living Systems appeared first on Robots & Pencils.

Pilot, Protect, Produce: A CIO’s Guide to Adopting AI Code Tools

Tyler Klein — Mon, 11 Aug 2025 16:06:40 +0000

How to responsibly explore tools like GitHub Copilot, Claude Code, and Cursor—without compromising privacy, security, or developer trust

AI-assisted development isn’t a future state. It’s already here. Tools like GitHub Copilot, Claude Code, and Cursor are transforming how software gets built, accelerating boilerplate, surfacing better patterns, and enabling developers to focus on architecture and logic over syntax and scaffolding.

The productivity upside is real. But so are the risks.

For CIOs, CTOs, and senior engineering leaders, the challenge isn’t whether to adopt these tools—it’s how. Because without the right strategy, what starts as a quick productivity gain can turn into a long-term governance problem.

Here’s how to think about piloting, protecting, and operationalizing AI code tools so you move fast, without breaking what matters.

Why This Matters Now

In a recent survey of more than 1,000 developers, 81% of engineers reported using AI assistance in some form, and 49% reported using AI-powered coding assistants daily. Adoption is happening organically, often before leadership even signs off. The longer organizations wait to establish usage policies, the more likely they are to lose visibility and control.

On the other hand, overly restrictive mandates risk boxing teams into tools that may not deliver the best results and limit experimentation that could surface new ways of working.

This isn’t just a tooling decision. It’s a cultural inflection point.

Understand the Risk Landscape

Before you scale any AI-assisted development program, it’s essential to map the risks:

Data leakage: Code snippets may contain proprietary logic or PII. With some tools, there’s a risk that these are logged, transmitted, or even used in model training.

Telemetry and usage tracking: Many tools send back usage metadata, which could raise compliance or IP concerns in regulated environments.

Model transparency: Enterprise IT teams often have limited visibility into how third-party LLMs are trained or updated.

Token costs: High-volume usage of external LLMs like Anthropic’s Claude or OpenAI’s GPT-4 can drive significant costs if left unmonitored.

These aren’t reasons to avoid adoption. But they are reasons to move intentionally with the right boundaries in place.

Protect First: Establish Clear Guardrails

Protect First: Establish Clear Guardrails

A successful AI coding tool rollout begins with protection, not just productivity. As developers begin experimenting with tools like Copilot, Claude, and Cursor, organizations must ensure that underlying architectures and usage policies are built for scale, compliance, and security.

Consider:

Private repo isolation: Restrict tool access to non-sensitive codebases or open-source contributions during pilot phases.

In-house proxies or middle layers: Route prompt traffic through approved gateways that monitor or sanitize inputs.

Enterprise contracts over consumer logins: Ensure tools used by developers are under organizational agreements with clear data handling terms.

LLM containment strategies: For high-sensitivity environments, explore containerized models or fully managed options through secure platforms like Amazon Bedrock. Bedrock enables teams to use leading foundation models, including Anthropic’s Claude, within an enterprise-grade boundary, with no risk of model training leakage.

For teams ready to push further, Bedrock AgentCore offers a secure, modular foundation for building scalable agents with memory, identity, sandboxed execution, and full observability, all inside AWS. Combined with S3 Vector Storage, which brings native embedding storage and cost-effective context management, these tools unlock a secure pathway to more advanced agentic systems.

Most importantly, create an internal AI use policy tailored to software development. It should define tool approval workflows, prompt hygiene best practices, acceptable use policies, and escalation procedures when unexpected behavior occurs.

These aren’t just technical recommendations, they’re prerequisites for building trust and control into your AI adoption journey.

Pilot Intentionally

Start with champion teams who can balance experimentation with critical evaluation. Identify low-risk use cases that reflect a variety of workflows: bug fixes, test generation, internal tooling, and documentation.

Track results across three dimensions:

Developer experience: Does the tool actually help, or does it create new friction?

Code quality: Are generated suggestions valid, performant, and secure?

Team patterns: How do developers prompt? What guardrails do they naturally adopt or ignore?

Encourage developers to contribute usage insights and prompt examples. This creates the foundation for internal education and tooling norms.

Don’t Just Test—Teach

AI coding tools don’t replace development skills; they shift where those skills are applied. Prompt engineering, semantic intent, and architectural awareness become more valuable than line-by-line syntax.

That means education can’t stop with the pilot. To operationalize safely:

Embed coaching into code reviews (e.g., flagging unsafe prompt usage)

Create internal wikis or LLM-safe prompt libraries

Train tech leads on where generation helps and where it hurts

Build reusable workflows for common AI development scenarios

When used well, these tools amplify good developers. When used poorly, they obscure problems and inflate false productivity. Training is what makes the difference.

Produce with Confidence

Once you’ve piloted responsibly and educated your teams, you’re ready to operationalize with confidence. That means:

Defining tool selection criteria for different project types

Monitoring token usage and LLM cost impact

Establishing a feedback loop between engineering, IT, and security

Treating AI-assisted development as an evolving discipline—not a one-time rollout

Organizations that do this well won’t just accelerate development, they’ll build more resilient software teams. Teams that understand both what to build and how to orchestrate the right tools to do it. The best engineering leaders won’t mandate one AI tool or ban them altogether. They’ll create systems that empower teams to explore safely, evaluate critically, and build smarter together.

Robots & Pencils: Secure by Design, Built to Scale

At Robots & Pencils, we help enterprise engineering teams pilot AI-assisted development with the right mix of speed, structure, and security. Our preferred LLM, Anthropic, was chosen precisely because we prioritize data privacy, source integrity, and ethical model design; values we know matter to our clients as much as productivity gains.

We’ve been building secure, AWS-native solutions for over a decade, earning recognition as an AWS Partner with a Qualified Software distinction. That means we meet AWS’s highest standards for reliability, security, and operational excellence while helping clients adopt tools like Copilot, Claude Code, and Cursor safely and strategically.

We don’t just plug in AI; we help you govern it, contain it, and make it work in your world. From guardrails to guidance, we bring the technical and organizational design to ensure your AI tooling journey delivers impact without compromise.

The post Pilot, Protect, Produce: A CIO’s Guide to Adopting AI Code Tools appeared first on Robots & Pencils.

Designing for the Unpredictable: An Introduction to Emergent Experience Design

Tyler Klein — Sat, 26 Jul 2025 13:58:52 +0000

Why Generative AI Requires Us to Rethink the Foundations of User-Centered Design

User-centered design has long been our north star—grounded in research, journey mapping, and interfaces built around stable, observable tasks. It has been methodical, human-centered, and incredibly effective—until now.

LLM-based Generative AI and Agentic Experiences, have upended this entire paradigm. These technologies don’t follow predefined scripts. Their interfaces aren’t fixed, their user journeys can’t be mapped, and their purpose unfolds as interaction happens. The experience doesn’t precede the user—it emerges from the LLM’s interaction with the user.

This shift demands a new design framework—one that embraces unpredictability and builds adaptive systems capable of responding to fluid goals. One that doesn’t deliver rigid interfaces, but scaffolds flexible environments for creativity, productivity, and collaboration. At Robots & Pencils, we call this approach Emergent Experience Design.

The Limits of Task-Based UX

Traditional UX design starts with research that discovers jobs to be done. We uncover user goals, design supporting interfaces, and optimize them for clarity and speed. When the job is known and stable, this approach excels.

But LLM-based systems like ChatGPT aren’t built for one job. They serve any purpose that can be expressed in language at run-time. The interface isn’t static. It adapts in real time. And the “job” often isn’t clear until the user acts.

If the experience is emergent, our designs need to be as well.

Emergent Experience Design: A UX Framework for Generative AI

Emergent Experience Design is a conceptual design framework for building systems that stay flexible without losing focus. These systems don’t follow scripts—they respond.

Adapt to user goals in real time

Respond intelligently to unpredictable behavior

Stay aligned to intended outcomes without relying on rigid structures

To do that, they’re built on three types of components:

1. Open Worlds

Open worlds are digital environments intentionally designed to invite exploration, expression, and improvisation. Unlike traditional interfaces that guide users down linear paths, open worlds provide open-ended sandboxes for users to work freely—adapting to user behavior, not constraining it. They empower users to bring their own goals, define their own workflows, and even invent new use cases that a designer could never anticipate.

To define these worlds, we begin by choosing the physical or virtual space—a watch, a phone, a desktop computer, or even smart glasses. Then, we can choose one or more interaction design metaphors for that space—a 3D world, a spreadsheet grid, a voice interface, etc. A design vocabulary then defines what elements can exist within that world—from atomic design elements like buttons, widgets, cells, images, or custom inputs, to more expressive functionality like drag-and-drop layouts, formula editors, or a dialogue system.

Finally, open worlds are governed by a set of rules that control how objects interact. These can be strict (like physics constraints or permission layers) or soft (like design affordances and layout behaviors), but they give the world its internal logic. The more elemental and expressive the vocabulary and rules are, the more varied and creative the user behavior becomes.

Different environments will necessitate different component vocabularies—what elements can be placed, modified, or triggered within the world. By exposing this vocabulary via a structured interface protocol (similar to Model-Context-Protocol, or MCP), LLM agents can purpose-build new interfaces in the world responsively based on the medium. A smartwatch might expose a limited set of compact controls, a desktop app might expose modal overlays, windows or toolbars, and a terminal interface might offer only text-based interactions. Yet from the agent’s perspective, these are just different dialects of the same design language—enabling the same user goal to be rendered differently across modalities.

Open worlds don’t prescribe a journey—they provide a landscape. And when these environments are paired with agents, they evolve into living systems that scaffold emergent experiences rather than dictate static ones.

2. Assistive Agents

Assistive agents are the visible, intelligent entities that inhabit open worlds and respond to user behavior in real time. Powered by large language models or other generative systems, these agents act as collaborators—interpreting context, responding to inputs, and acting inside (and sometimes outside) the digital environment. Rather than relying on hardcoded flows or fixed logic, assistive agents adapt dynamically, crafting interactions based on historical patterns and real-time cues.

Each assistive agent can be shaped by two key ingredients:

Instinct: The training and architecture of the underlying LLM model, which provides its foundational capabilities. This could include the ability to understand text or image inputs, the language in which it responds, and its underlying reasoning patterns.

Identity: The purpose and personality assigned through prompt instructions and contextual inputs that shape the agent’s perspective—what it knows, how it prioritizes information, and how it speaks or acts.

These two ingredients work together to shape agent behavior: instinct governs what the model can do, while identity defines what it should do in a given context. Instinct is durable—coded in the model’s training and architecture—while identity is flexible, applied at runtime through prompts and context. This separation allows us to reuse the same foundation across wildly different roles and experiences, simply by redefining the agent’s identity.

Agents can perceive a wide variety of inputs from typed prompts or voice commands to UI events and changes in application state—even external signals from APIs and sensors. Increasingly, these agents are also gaining access to formalized interfaces—structured protocols that define what actions can be taken in a system, and what components are available for composition. One emerging standard, the Model-Context-Protocol (MCP) pattern introduced by Anthropic, provides a glimpse of this future: an AI agent can query a system to discover its capabilities, understand the input schema for a given tool or interface, and generate the appropriate response. In the context of UI, this approach should also open the door to agents that can dynamically compose interfaces based on user intent and a declarative understanding of the available design language.

Importantly, while designers shape an agent’s perception and capabilities, they don’t script exact outcomes. This allows the agent to remain flexible and resilient, and able to improvise intelligently in response to emergent user behavior. In this way, assistive agents move beyond simple automation and become adaptive collaborators inside the experience.

The designer’s job is not to control every move the agent makes, but to equip it with the right inputs, mental models, and capabilities to succeed.

3. Moderating Agents

Moderating agents are the invisible orchestration layer of an emergent system. While assistive agents respond in real time to user input, moderating agents maintain focus on long-term goals. They ensure that the emergent experience remains aligned with desired outcomes like user satisfaction, data completeness, business objectives, and safety constraints.

These agents function by constantly evaluating the state of the world: the current conversation, the user’s actions, the trajectory of the interaction, and any external signals or thresholds. They compare that state to a defined ideal or target condition, and when gaps appear, they nudge the system toward correction. This could take the form of suggesting a follow-up question to an assistant, prompting clarification, or halting actions that risk ethical violations or user dissatisfaction.

Moderating agents are not rule-based validators. They are adaptive, context-aware entities that operate with soft influence rather than hard enforcement. They may use scoring systems, natural language evaluations, or AI-generated reasoning to assess how well a system is performing against its goals. These agents often manifest through lightweight interventions—such as adjusting the context window of an assistive agent, inserting clarifying background information, reframing a prompt, or suggesting a next step. In some cases, they may even take subtle, direct actions in the environment—but always in ways that feel like a nudge rather than a command. This balance allows moderating agents to shape behavior without disrupting the open-ended, user-driven nature of the experience.

Designers configure moderating agents through clear articulation of intent. This can include writing prompts that define goals, thresholds for action, and strategies for response. These prompts serve as the conscience of the experience—guiding assistants subtly and meaningfully, especially in open-ended contexts where ambiguity is the norm.

Moderating agents are how we bring intentionality into systems that we don’t fully control. They make emergent experiences accountable, responsible, and productive without sacrificing their openness or creativity.

From Intent to Interface: The Role of Protocols

The promise of Emergent Experience Design doesn’t stop at agent behavior—it extends to how the experience itself is constructed. If we treat user goals as structured intent and treat our UI vocabulary as a query-able language, then the interface becomes the result of a real-time negotiation between those two forces.

This is where the concept of Model-Context-Protocol becomes especially relevant. Originally defined as a mechanism for AI agents to discover and interact with external tools, MCP also offers a compelling lens for interface design. Imagine every environment—from mobile phones to smartwatches to voice UIs—offering a structured “design language” via an MCP server. Agents could then query that server to discover what UI components are supported, how they behave, and how they can be composed.

A single requirement—say, “allow user to log in”—could be expressed through entirely different interfaces across devices, yet generated from the same underlying intent. The system adapts not by guessing what to show, but by asking what’s possible, and then composing the interface from the capabilities exposed. This transforms the role of design systems from static libraries to living protocols, and makes real-time, device-aware interface generation not just feasible, but scalable.

A Mindset Shift for Designers

In this new paradigm, interfaces are no longer fixed blueprints. They are assembled at runtime based on emerging needs. Outcomes are not guaranteed—they are negotiated through interaction. And user journeys are not mapped—they are discovered as they unfold. This dynamic, improvisational structure demands a design framework that embraces fluidity without abandoning intention.

As designers, we have to move from architects of static interfaces to cultivators of digital ecosystems. Emergent Experience Design is the framework that lets us shape the tools and environments where humans co-create with intelligent assistants. Instead of predicting behavior, we guide it. Instead of controlling the path, we shape the world.

Why It Matters

Traditional UX assumes we can observe and anticipate user goals, define the right interface, and guide people efficiently from point A to B. That worked—until GenAI changed the rules.

In agentic systems, intent is fluid. Interfaces are built on the fly. Outcomes aren’t hard-coded—they unfold in the moment. That makes our current design models brittle. They break under uncertainty.

Emergent Experience Design gives us a new toolkit. It helps us move from building interfaces for predefined jobs to crafting systems that automate discovery, collaboration, and adaptation in real time.

With this framework, we can:

Meet users where they are—not where we expect them to be

Guide them through complex systems with responsive, context-aware support

Preserve creativity, flexibility, and human agency at every step

In short: it lets us design with the user, not just for them. And in doing so, it unlocks entirely new categories of experience—ones too dynamic to script, and too valuable to ignore.

The post Designing for the Unpredictable: An Introduction to Emergent Experience Design appeared first on Robots & Pencils.