Heard us on
The AI Daily Brief Podcast?

Heard us on The AI Daily Brief Podcast? For AI, we're all in on AWS. Let's build AI co-workers for your team.

AWS Summit Warsaw 2026: What We Saw, Who We Met, and What It Confirmed 

Our Ukraine team spent May 6 at the AWS Summit (EXPO XXI) in Warsaw. Here is what we saw, what surprised us, and what we are bringing back. 

EXPO XXI is a short ride from the center of Warsaw, and on a May morning you can feel the conference before you see it. The queue outside was long but moving fast. Hoodies, lanyards, laptop bags. The crowd skewed more senior than you might expect at a free regional event.  

Our plan was deliberate. Show up early, split the agenda, cover more ground in parallel, regroup over coffee, and most importantly, validate our approach.  

Robots & Pencils team members in attendance, from left to right: Bohdan Popovych, Rostyslav Volskyi, and Stanislav Makar. 

Robots & Pencils team members in attendance, from left to right: Stanislav Makar, Rostyslav Volskyi, and Bohdan Popovych. 

Agentic AI is the AWS Headline. 

The opening keynote made one thing clear. Agentic AI is the organizing thesis for everything AWS is building in 2026. 

Three names anchored the story. Kiro, the agentic IDE that got a fresh push at re:Invent 2025, featured prominently with its spec-driven development model, sequenced task generation, and agents that produce tests alongside code. Nova 2, the model powering more of the AWS AI surface, continues its region-by-region rollout. AWS Transform, their modernization platform for mainframe, VMware, and .NET workloads, framed as the agentic path into enterprise legacy systems. 

Real customer stories on stage. Real numbers. Real screenshots. The European Sovereign Cloud and the EMEA AI Hub got dedicated time, which landed well with the Warsaw audience. The framing was consistent throughout: the shift from AI tools you prompt to AI agents that reason, plan, and act is underway. The question for builders is how you instrument, evaluate, and trust what those agents do. 

That question got a very good answer in the next session. 

The Session That Landed: AgentCore Evaluations in Production 

Right timing matters at a conference, and the AgentCore deep-dive landed at exactly the right moment. AWS spent the spring pushing AgentCore Evaluations hard. It went GA on March 31, 2026, and the Warsaw session put it directly in front of European builders. 

The plain-language version of what it does: a managed service that continuously monitors agent quality against real production traces, not just test suites. You are shipping agents. You need to know they work. Handing someone a scorecard you hand-rolled for each project is not a sustainable answer. This is. 

The built-in evaluators cover what matters in production: 

On top of those you can configure custom evaluators. LLM-as-judge with your own prompt and model, or code-based evaluators running on Lambda. The same framework handles hallucination detection and JSON schema validation without forcing two different toolchains. 

The detail that made us lean forward: full OpenTelemetry compatibility. The evaluator scores flow into existing dashboards alongside session count, latency, token usage, and error rates. You can alert on agent quality the same way you alert on a CPU spike. 

For anyone building agents on behalf of enterprise customers, this solves the credibility problem. “How do you know it works in production” is no longer a hand-waving moment. 

The Best Conversation Happened at the Espresso Machine 

One of the more useful exchanges of the day started while waiting for coffee. 

AWS set up a cloud-ordered espresso bar on the expo floor. You scanned a QR code, placed your order in a small web app, and the espresso machine queued it. When the drink was ready, the screen showed your name. No line. No barista small talk. Beautifully on-brand for a cloud event, and genuinely better than the alternative. 

Serverless Coffee Bar - AWS Summit Warsaw Robots and Pencils

While we waited, a conversation started with a Senior Solutions Architect at AWS. It turned into one of the most useful exchanges of the day. The topic was whether Lambda is a credible runtime for agentic workflows. The honest answer is: it depends on whether you have state. 

An agent is not a request and a response. It is a long, branching workflow with LLM calls, tool invocations, and occasional human-in-the-loop steps. Lambda durable functions, which AWS shipped in late 2025 and has been shaping for agentic use cases since, address this directly. Each LLM call and each tool invocation becomes a checkpointed step inside a single Lambda. If execution times out mid-loop, the next invocation replays from the last checkpoint and skips completed steps. No Step Functions wiring. No custom state store. No DIY replay logic. The orchestration lives in the function code, in the language you already use. 

The Java SDK went GA in April 2026. Durable functions are now available in sixteen additional regions. 

The Best Hour of the Day: Knowledge Graphs 

Two talks on knowledge graphs stood out as the strongest technical content of the summit. The first was delivered by Dmytro Romantsov, Senior SRE at Miro, on their internal AI agent built over an organizational graph. The talk was technically dense and honest: he walked through what failed before the team settled on a graph-backed architecture, what the graph actually contains, how updates flow into it, and where the approach delivers measurably better results than the pre-graph baseline. 

After the session, we walked over to talk to him. Small-world moment: we share a first language, switched off English immediately, and the conversation opened up. The core thesis from both the talk and the follow-up conversation was consistent. Enterprise AI agents are only as good as the organizational knowledge they can reason over. A graph gives that knowledge structure, updateability, and query depth that flat retrieval cannot match. That is not a new idea, but watching it validated independently at Miro’s scale makes the argument more concrete. 

The second strong graph talk came from an SLB engineer in DEV207, on context graphs for explainable AI agents. The framing that stuck: the difference between a state clock and an event clock. Most pipelines today reflect the current state of a system. A context graph that also captures decision events can answer “why did this happen, and in what order.” That is the kind of explainability enterprise buyers are starting to require as agents move from pilot to live. 

Asking Honest Questions About AWS Transform 

The AWS Transform booth was busy. The team arrived with a direct question about IBM RPG support and walked through the answer methodically with a Solutions Architect for Migration and Modernization at AWS. 

The most telling moment was watching an AWS specialist type the same question into their own tool in front of us. The answer came back: yes, with limitations, followed by pages of caveats. Informative in its own way. 

The bottom line is that AWS Transform is production-grade for COBOL, Java-to-JavaScript migrations, VMware modernization, and mainframe workloads. RPG support is real but not ready for complex production use cases. We left with clarity on where the tool genuinely shines and where the right path is a combination of other tools and hand-rolled pipelines. That kind of honest answer is the second-best outcome at a conference. It tells you your reasoning was sound. 

The VMware migration angle, by contrast, is genuinely strong. Broadcom’s license changes are creating real urgency for customers running on VMware infrastructure. Worth flagging for relevant engagements. 

The Compute Thesis: AWS is Sizing Infrastructure for Self-Managed AI 

A theme ran underneath the agentic-AI headline all day: AWS is provisioning compute to match the shape of AI demand, and the demand right now for these kinds of workloads is high. 

Two sessions made the same point from opposite ends of the price spectrum. Comarch walked through a real migration from x86 to AWS Graviton-based instances, with meaningful cost reductions and measured performance gains. The honest part of their talk: Graviton is not a flag flip. If you have native code, JNI bindings, or JIT-tuned hotspots, you pay for the migration before you see the savings. 

On the other end of the spectrum: Meta’s agreement to deploy AWS Graviton processors at scale, starting with tens of millions of Graviton cores, announced ten days before the summit and explicitly framed around CPU-intensive agentic AI workloads — real-time reasoning, code generation, and multi-step task orchestration. 

For Robots & Pencils, this opens a third option alongside Bedrock and direct provider APIs. For clients with data-residency constraints, predictable high-volume workloads, or smaller open-weight models where managed-API margins make self-managed attractive, the playbook is now well-documented and accessible. Independent benchmarks on Llama 3.1 8B have Graviton4 delivering roughly 2x the tokens per dollar of comparable x86 options for that model class. 

A Practitioner’s Checklist for 2026 

The session that generated the most useful signal for client-facing conversations was DEV209, delivered by Tomasz Dudek, Data and AI Team Lead at Chaos Gears and an AWS Machine Learning Hero. The premise was simple: AI has been mainstream for over three years. He has watched hundreds of Amazon Bedrock projects pass through his hands. Most near-failures trace back to a small set of repeatable mistakes. 

The talk was the inverse of a vendor pitch. Here is exactly how teams stall before the first line of code. Here is what to do instead. He closed with 13 numbered tips for approaching AI projects in 2026. The final line: “Have evals, really.” 

It was good to hear a practitioner at that level land on the same conclusions we have been operating on. The teams doing this work at scale are converging on the same principles, and the list mapped closely to how we already approach agent quality on client engagements. Confirmation from that angle is worth having. 

The Parts That Were Just Fun 

Not everything at a summit is a session worth writing home about. But a few moments, in addition to the Serverlesspresso bar, which was cool enough to warrant a second mention, stood out for the right reasons. 

The AWS Drive Your Data Formula 1 simulator was exactly what it looked like: two Fanatec rigs, full wraparound LED screens, a Canada time-trial, and a results board you could compete on. The pitch underneath was real telemetry and lap analytics. The booth’s job was to draw a crowd, and it absolutely did. The team took turns. 

And the Ukrainian-speaking community was well-represented at this summit. Several familiar-sounding conversations happened in unexpected corners of the expo. That part mattered. 

What the Day Confirmed 

The most useful thing a conference can do is sharpen your picture of where the tools are today versus where they are heading. Warsaw 2026 did that well. 

Agentic AI is no longer a roadmap commitment from AWS. It is the organizing logic of everything they showed. Agent evaluation infrastructure is production-ready and instrumented the way mature engineering teams expect. The compute story has matured to a point where self-hosting is a genuine option for the right workloads, not just a theoretical one. Knowledge graphs as a foundation for enterprise AI agents are getting independent validation at scale. And the practitioners who have been doing this work longest are converging on the same principles around evaluation, quality gates, and shipping agents that are honest about what they know. 

None of that surprised us. All of it was good to see confirmed. 

Warsaw 2026 delivered real technical depth on agentic AI, agent evaluation, and knowledge graphs. The team went in with specific questions and came back with sharper answers, a few useful new contacts, and a strong argument for cloud-ordered coffee at the next internal engineering day. 

Robots & Pencils is an AWS Advanced Tier Services Partner and AWS Pattern PartnerRequest an AI Briefing today. 

Written by Bohdan Popovych: Robots & Pencils Ukraine Engineering Manager, Rostyslav Volskyi: AWS Certified Solutions Architect and Amazon Web Services Developer, and Stanislav Makar: AWS Certified Solutions Architect – Professional.  

We Took a Real Problem into the Amazon Quick Hackathon. It Delivered.

I spent last Tuesday at Amazon’s ORD11 office with five colleagues from Robots & Pencils, building on Amazon Quick for the day.  

The Problem We Brought In 

We brought a live use case from one of our enterprise customers, a regulated utility dealing with alarm overload, aging infrastructure they must migrate off by 2028, and the steady departure of the asset experts who know how all of it really works. 

Robots & Pencils at AWS Amazon Quick Hackathon
Photo by Scott Young: Pictured L-R Lisa Bayne, Stefan Deusch, Alex Shumski, Saul Delage, Adrian Bird

What We Built (And What Surprised Me!) 

By the end of the day we had a working end-to-end agentic workflow that includes a dashboard pulling device telemetry into one view, an agent that triages incoming alerts and recommends what to do about them, and a knowledge base that captures the kind of expertise that usually walks out the door when someone retires. It’s nowhere near production, but it’s enough that we are ready to sit down with the customer next week and have a concrete concept discussion instead of a whiteboard one. That’s the part that surprised me most. 

We were also lucky enough to be recognized as one of the winning partners on the day, which was a nice bonus. 

Robots & Pencils at AWS Amazon Quick Hackathon
Photo by Scott Young: Pictured L-R Lisa Bayne, Stefan Deusch, Adrian Bird, Alex Shumski, Saul Delage

A Few Thanks 

A few thanks are in order. Naresh Rajaram, Sr. Partner Solutions Architect at AWS, ran a genuinely well-organized event. Every detail was thought through. Neal Cauley’s framing of where Amazon Quick is heading was probably the most useful 30 minutes of the day for me, and it connected back to what Rima Olinger, World Wide Director Data & AI GTM – Amazon Quick, has been sharing publicly about how Amazon itself is using the product internally. Worth reading if you haven’t. Thanks also to the AWS team for inviting us and to the Quick specialists who sat at our table and helped us push further than we would have on our own. 

Looking forward to the next one. 

Robots & Pencils is an AWS Advanced Tier Services Partner and AWS Pattern PartnerRequest an AI Briefing today.  


About the Author 

Adrian Bird is Vice President of AWS Partnership at Robots & Pencils, where he leads the company’s AWS Partner strategy and execution, expanding joint customer engagement, and strengthening alignment with AWS teams. Connect with Adrian.

Every Energy AI Initiative Stalls in the Same Three Places. Robots & Pencils Names Them. 

Three organizational failures. One misdiagnosis. A three-part series that tells energy leaders exactly where to look. 

Robots & Pencils, an applied AI engineering partner known for high-velocity delivery and measurable business outcomes, published The Fault Line, a three-part series examining the organizational failures keeping energy AI trapped in perpetual pilot mode. 

Forty percent of utility control rooms will deploy AI-driven operators by 2027, according to Gartner. Yet fewer than seven percent of energy organizations have gone live with even one AI use case, according to IDC and AWS research. The gap between investment and execution continues to widen across the sector. 

The Fault Line argues the problem lives in three specific organizational breakdowns that repeatedly prevent AI from reaching production environments and generating operational learning at scale. Scott Young, EVP of Growth and Strategic Alliances at Robots & Pencils, wrote the series for the energy executive who has approved the budget, built the pilot, and is still waiting for AI to run. 

“Energy executives are moving faster through decisive action that turns AI investment into operational advantage,” said Young. “Every quarter spent in evaluation is a quarter of compounding operational learning moving somewhere else. That is the fault line. And it is solvable.” 

Three Articles. Three Failures. One Compounding Reality. 

Part 1 – Going Live with Energy AI Starts with One Decision. The applications energy executives are waiting on are already ready to deploy. They have been for years. The first article examines the one thing standing between investment and results, and it is not technology. 

Part 2 – Energy AI Operator Trust Is Earned by DesignWhen AI stalls in the control room, the default explanation is operator resistance. The second article argues that explanation is aimed at the wrong problem entirely and that the organizations making the most progress stopped trying to manage adoption and started doing something else. 

Part 3 – The Energy AI Architecture Decision That Outlasts Every Tool. Most energy organizations are not building AI. They are accumulating it. The third article names the difference between a collection of tools that cannot learn from each other and an architecture that compounds and explains why no one selling AI tools has a financial incentive to close that gap. 

Why This Matters Now 

Investment, urgency, and operational pressure are converging quickly across the industry. The DOE’s Genesis Mission mobilized $293 million to advance AI in grid operations. ERCOT launched a dedicated Enterprise Data and AI organization in January 2026. At the same time, many organizations are adding AI systems faster than they are building the operational foundations required to scale them effectively. 

The Fault Line identifies three areas where that gap consistently appears including executive decision velocity, operator-centered system design, and architectures capable of compounding intelligence across the enterprise. The series also addresses the regulatory and operational realities utility leaders face while advancing AI initiatives within NERC CIP environments. 

Each article stands on its own. Together, the series presents a clear argument for how energy organizations move from isolated pilots to operational AI systems that improve through live deployment. 

“The energy sector is entering a period where AI advantage compounds faster than most executives expect,” Young said. “The organizations deploying now will be operating systems shaped by thousands of hours of real-world learning while others are still refining pilots. The opportunity belongs to the organizations willing to move.” 

Read the Series 

The Fault Line is available now at robotsandpencils.com. Energy executives interested in accelerating AI deployment and operational readiness can request an AI Briefing. 

Part 1 – The Fault Line: Going Live with Energy AI Starts with One Decision 

This three-part series examines the three organizational failures that keep energy AI in perpetual pilot and what leaders who have moved past them did differently. Each article stands alone. The full series is the argument. 

Part 1: Make the Decision | Part 2: Fix the Design | Part 3: Build the Architecture 

The energy executives I talk to have already committed to generative and agentic AI. The budgets are approved. The strategic plans name it. What most have not committed to yet is treating that deployment as an operational decision rather than an ongoing evaluation. That distinction is the entire ballgame. 

Gartner projects that 40 percent of utility control rooms will deploy AI-driven operators by 2027. Nearly all energy CIOs plan to increase AI investment, at an average spending increase of 38 percent. The DOE’s Genesis Mission mobilized $293 million targeting AI’s role specifically in grid operations and reliability. The conditions for large-scale energy AI deployment have never been more aligned. 

An IDC and Amazon Web Services (AWS) study of more than 900 organizations found that fewer than 7 percent have reached full production with even one AI use case. 

The standard explanation is that energy is a uniquely complex operating environment. Legacy systems, fragmented data, strict regulation, and safety-critical infrastructure are real constraints. Energy leaders reach for them first when AI stalls. They are the setting, not the cause. 

The actual reason is less comfortable. The applications energy leaders want most are already ready to deploy. Most organizations are waiting for a technology problem to solve when the problem is organizational. 

The Energy AI Readiness Gap Nobody Is Naming 

According to the Federation of American Scientists’ assessment of the Department of Energy’s priority AI applications, nearly half are high-impact and ready to deploy today. Operations and reliability use cases score 3.6 out of 5.0 on deployment readiness, the highest category in the entire assessment. 

The most urgently needed applications are also the most architecturally mature. 

That creates a specific kind of organizational trap. When technology readiness runs ahead of organizational readiness, leaders rarely recognize the gap for what it is. An initiative stalls, and the natural assumption is that something technical still needs improvement. The model needs more training data. The data environment needs more work. The pilot needs another quarter before it can prove itself. 

What actually needs improvement is what we call the decision architecture gap. Most energy organizations have not built the organizational capacity to evaluate, commit to, and scale AI applications based on evidence of operational value rather than proof of technical completion. 

What the Data Is Already Telling You 

Energy companies already have the data. They are waiting on the decision to act on it. 

NREL’s Open Energy Data Initiative hosts 2.6 petabytes of data across more than 2,000 datasets from 227 providers. Utilities already hold enormous volumes of AMI telemetry, SCADA signals, outage history, maintenance logs, and weather correlations. The question is not whether useful data exists. The question is whether it is being treated as institutional memory or as archived history. 

These are not the same thing. Archived history answers questions when asked. Institutional memory learns continuously, surfacing patterns, updating predictions, and sharpening with every new cycle of operational data. We call this the institutional memory framework. The architectural commitment to treat operational data as a living learning system rather than a reference archive is what separates organizations that compound AI advantage from those that accumulate AI cost. 

The data foundation is already there. The decision about what to build on it is the only variable left. 

The Compounding Cost of the Wait 

The energy sector is entering a period where AI advantage compounds. Organizations that go live now will be running systems that have learned through thousands of hours of real operating conditions by the time their competitors are still refining pilots. 

Grid operations, reliability, and predictive maintenance are the applications energy leaders typically pursue first. They are also the ones that compound most sharply with continuous learning. A predictive maintenance system that has processed two years of real failure data across a fleet of transformers is qualitatively different from a system that has processed none. That gap does not close when the second organization eventually decides to start. It widens. 

This is the real cost of treating AI deployment as a technology problem to be solved rather than an operational commitment to be made. The loss is not a single delayed quarter. It is the accumulated learning gap that grows while organizations wait for a breakthrough that is not coming. 

Where the Decision Lives 

The energy leaders making the most meaningful progress on AI are the ones who answered a harder question. Which operational outcomes matter enough to organize the entire effort around? 

The starting point is simple. Grid load forecasting, AMI analytics, outage prediction, and field operations automation are all deployable today as agentic AI teammates that act on operational data utilities already own, execute decisions, and learn from every cycle. They are the foundation that makes every more complex application possible because each one builds the organizational infrastructure for learning, not just for experimenting. 

The right question for energy executives is not whether to invest in AI. That investment is already moving. The right question is whether the organization is built to learn from what it deploys, or whether each initiative will generate insight for one team instead of compounding advantage across the enterprise. 

Going live with AI in energy begins with a decision about what the organization is building toward and the commitment to treat every deployment as a step in that direction rather than a standalone test of the technology. 

That decision is available right now. The technology has been ready for a while. 

Building AI that operators will actually use requires a different kind of design than most energy organizations are attempting. Read Part 2: “Energy Operator Trust is Earned By Design”.

About the Author 

Scott Young is EVP of Growth and Strategic Alliances at Robots & Pencils, where he works with energy executives to move from decision to live. Connect with Scott on LinkedIn.  

Key Takeaways 

FAQs 

What does organizational readiness for AI mean in energy? 

It means the organization has defined which operational outcomes matter most, built the data infrastructure to support continuous learning against those outcomes, and established the decision process to evaluate and scale AI based on operational evidence rather than technical completion. 

Why do so many energy AI initiatives stall after a successful pilot? 

Pilots succeed at the local level because they are designed to prove technical performance. They stall at scale because scaling requires organizational infrastructure — shared data foundations, clear outcome definitions, and the governance to move from proof to live. Most organizations have not built those yet. 

What is the difference between archived data and institutional memory for AI? 

Archived data answers questions when asked. Institutional memory learns continuously, surfacing patterns, sharpening predictions, and improving with every cycle of operational data. The distinction determines whether AI compounds across the enterprise or produces isolated results for individual teams. 

How do utilities close the gap between AI pilots and live deployment? 

The fastest path from decision to live is standardizing the data foundation before scaling the AI system. Organizations that treat operational data as a shared institutional asset rather than system-specific input compress deployment timelines significantly and avoid the fragmentation that keeps most pilots from going live. 

How long does it actually take to go live with energy AI? 

It depends almost entirely on data infrastructure readiness, not model complexity. Organizations that have standardized their data foundations and committed to treating operational data as institutional memory have gone live with AI in 90 to 120 days. Organizations that treat each deployment as a custom integration build take two to three times as long and often stall before going live. 

Which energy AI applications are ready to deploy today? 

Operations and reliability use cases score highest on deployment readiness across the DOE’s priority applications. Grid load forecasting, AMI analytics, outage prediction, demand response optimization, and field operations automation are all deployable now using data utilities already collect. The barrier is organizational commitment, not technology availability. 

What is the cost of waiting to deploy AI in energy? 

The primary cost is the compounding learning gap. AI systems improve through real operational data. Organizations that go live now will be running materially smarter systems in two years than organizations that delay. That gap widens with time and does not close simply by starting later with better technology. 

Part 2 – The Fault Line: Energy AI Operator Trust Is Earned by Design 

This three-part series examines the three organizational failures that keep energy AI in perpetual pilot and what leaders who have moved past them did differently. Each article stands alone. The full series is the argument. 

Part 1: Make the Decision | Part 2: Fix the Design | Part 3: Build the Architecture 

The default explanation for why AI stalls in energy operations goes something like this: operators resist change. They are comfortable with how things work, skeptical of technology they did not choose, and protective of the expertise they have spent decades building. The prescription that follows is predictable. Train them. Communicate more clearly. Involve them earlier. Manage the change. 

This explanation has merit. It is just aimed at the wrong problem. 

These are agentic AI systems, ones that surface recommendations, trigger actions, and learn from every operator decision. That distinction determines how trust gets built. Operator trust is earned through design. The organizations achieving live AI deployment in energy have stopped treating operator skepticism as something to overcome and started treating it as the signal that shapes how they build. 

The Confidence Paradox 

AI is most valuable in precisely the decisions where experienced utility operators are most confident. This is not a coincidence. It is the nature of complex operational environments. Grid stability calls, equipment risk assessments, and outage response sequencing are the decisions where utility operators carry the deepest accumulated judgment. In many organizations pursuing grid modernization, that knowledge is not documented anywhere. It retires when the operator does. These are also the decisions where AI can process patterns that no individual, regardless of experience, can evaluate at the speed and scale that modern grid operations demand. 

This creates a specific problem. When an AI system surfaces a recommendation that contradicts an experienced operator’s intuition, the operator does not typically pause and reconsider. They override. Sometimes they are right to do so. Often, neither side ever finds out, because the correction disappears into a workflow without becoming feedback. The AI does not learn from the override. The organization does not learn from the pattern. The system gets evaluated on whether operators accepted its recommendations, not on whether acceptance or rejection produced better outcomes. 

Dalhousie University review published in Energy identified building human operator trust as the primary open challenge in the field, ahead of model accuracy, computational requirements, and integration complexity. That ranking matters. It reflects what researchers studying the most advanced energy AI deployments believe is holding back the most promising applications. 

What Change Management Gets Wrong 

The standard response to operator skepticism focuses on the operator. Train them differently. Explain the model’s reasoning. Show the accuracy data. Demonstrate value over time. 

What this approach misses is that operator confidence is earned through repeated, verifiable demonstrations at the specific decision types operators care about most. Those demonstrations require something most implementations do not provide: a visible, credible track record at the local level before the system asks for broader authority. 

Gartner warns that more than 40 percent of agentic AI projects will be canceled by the end of 2027, citing unclear business value and inadequate risk controls as the primary causes. In energy operations, inadequate risk controls and operator trust are the same thing. An operator who does not trust a recommendation will not act on it. An organization that cannot get operators to act on AI recommendations cannot demonstrate business value. The cancellation follows from the design failure, not from the technology. 

Alsaigh et al., writing in Frontiers in Energy Research, analyzed 3,568 academic papers on AI governance in energy and found that explainability is one of the most significant and least developed barriers to operator trust. The systems being deployed in energy are largely not designed to give utility operators what they need to verify, challenge, and ultimately rely on AI recommendations. That is a design gap, not a training gap. 

In regulated utility environments operating under NERC CIP standards, this design gap carries a second consequence. AI systems that cannot show their reasoning, support human override, and maintain audit trails fail both the trust requirement and the compliance requirement simultaneously. The design approach that earns operator trust in control room operations is also the one that satisfies regulatory expectations for human oversight of safety-critical decisions. 

Designing Energy AI for Operator Trust, Not Adoption 

The organizations deploying AI that reaches production in energy are not persuading operators. They are proving themselves to operators, one decision category at a time. 

Research from Argonne National Laboratory’s GridMind system and the University of Vermont’s PowerDAG framework illustrates this principle at the applied research level. Both were built explicitly for expert decision-support augmentation rather than operator replacement. PowerDAG achieves a 100 percent task success rate specifically because it incorporates just-in-time human supervision as an architectural feature, not as a fallback. The operator-in-the-loop is not a limitation of the system’s current capability. The operator in the loop is what makes the system trustworthy enough to act on. 

This design commitment is consistent across every advanced energy AI system in the current research landscape. Each of the following was built with operator augmentation as the primary design requirement, not an afterthought: 

Every production-grade energy AI system identified in the current research literature shares this design commitment. That is the finding. The approach starts AI deployment at narrow, verifiable decision categories, builds a track record utility operators can see and challenge, and earns expanded scope based on demonstrated accuracy rather than elapsed time or training hours. It treats operator confidence as something AI must demonstrate, and organizational readiness as something that follows from the design. 

Progressive trust architecture is the design approach of starting AI deployment at narrow, verifiable decision categories, building a track record utility operators can see and challenge, and earning expanded scope based on demonstrated accuracy rather than elapsed time or training hours. It treats operator confidence as something AI must demonstrate, not something organizations must develop. 

Tampere University study published in February 2026 found exactly this pattern in practice, conducting 16 interviews across nine departments of a Nordic energy company and identifying 41 AI-related use cases. Employees described successful AI introduction through incremental steps that aligned with existing workflows. They described it consistently as an evolution, one that fit the existing shape of the work rather than demanding the work reshape itself. 

The Operator as Feedback Architecture 

When the design takes hold, the dynamic inverts. Operator skepticism becomes the most valuable signal in the system. 

Every time a utility operator reviews an AI recommendation, accepts it, overrides it, or flags it as wrong, that interaction carries information the system needs to improve in an operator feedback loop. In an agentic AI system, every human interaction with a recommendation is training data. That is what makes operator trust an architectural requirement, not a change management task. Organizations designed to capture and act on those signals are going live with AI that compounds in intelligence over time. Organizations that treat operator involvement as a transition phase on the way to full automation are managing adoption in perpetuity. 

EPRI’s RADAR Initiative treats human capital development as a deployment prerequisite, not a follow-on activity. That sequencing reflects an understanding that the system’s intelligence and the operator’s intelligence need to develop in parallel, each informing the other, before the combination is ready to take on the decisions that matter most for grid modernization and operational reliability. 

The organizations that earn operator trust design AI around the rules operators already follow. The operator’s existing process becomes the specification. Trust follows from the design. 

Why Energy AI Operator Trust Is a C-Suite Problem 

Energy AI operator trust is an architecture decision, and it belongs in the executive conversation alongside every other architectural decision the organization is making. 

Energy leaders who reframe it that way will find their AI initiatives stop requiring managed adoption programs. When a system proves itself in decisions utility operators already own, and when it visibly learns from every interaction rather than ignoring operator judgment, trust follows from the design rather than preceding it. 

In the energy organizations getting this right, the technology earns the operators. That is the design commitment that everything else follows from. 

Progressive trust architecture earns the operators. Compounding intelligence architecture earns the advantage. Read the final article in this series: “The Energy AI Architecture Decision That Outlasts Every Tool.” 

About the Author 

Scott Young is EVP of Growth and Strategic Alliances at Robots & Pencils, where he works with energy executives to move from decision to live. Connect with Scott on LinkedIn

Key Takeaways 

    FAQs 

    Why do energy operators resist AI recommendations? 

    Utility operators do not resist AI because of technophobia. They resist recommendations they cannot verify, from systems that do not operate by the same rules they do. The organizations making the most progress treat operator skepticism as a design requirement rather than a change management problem. 

    How does design earn operator trust in energy AI? 

    Progressive trust architecture is the design approach of starting AI deployment at narrow, verifiable decision categories, building a track record utility operators can see and challenge, and earning expanded scope based on demonstrated accuracy rather than elapsed time or training hours. It treats operator confidence as something AI must demonstrate, not something organizations must develop. 

    How do we implement AI in NERC CIP-regulated control room environments? 

    NERC CIP compliance and energy AI operator trust are co-dependent in utility control room environments. AI systems that make their reasoning visible, support human override, and maintain full audit trails satisfy both requirements simultaneously. The design approach that earns operator trust in control room operations is also the one that meets regulatory expectations for human control over safety-critical decisions. 

    How do you design AI that energy operators will actually use? 

    The most consistently successful approach is designing AI around existing operator workflows rather than alongside them. That means incorporating the actual rules, constraints, and judgment criteria operators use, making AI reasoning visible in terms operators can evaluate and challenge, and starting with decisions where the AI can build a verifiable track record before expanding its scope. 

    What is the connection between operator trust and AI ROI in energy? 

    They are the same thing. A utility operator who does not trust an AI recommendation will not act on it. An organization that cannot get operators to act on AI recommendations cannot demonstrate business value. Gartner projects more than 40 percent of agentic AI projects will be canceled by end of 2027. Inadequate risk controls is one of the primary causes, and in energy operations, risk control and operator trust are inseparable. 

    How do we capture retiring operator knowledge before it is lost? 

    AI systems designed to learn from every operator interaction are uniquely positioned to capture institutional knowledge from experienced utility operators. Each acceptance, override, and correction the system receives from a senior operator encodes judgment that would otherwise retire with that person. Organizations that deploy AI before their most experienced operators leave are building a knowledge base that survives the workforce transition. 

    Is operator trust in AI a technology problem or a leadership problem? 

    It is a design problem, which makes it a leadership problem. Technology teams will build what they are asked to build. If they are asked to minimize operator friction rather than earn operator trust, that is what gets built. The framing of the requirement determines the outcome. Energy leaders who put operator trust into the design specification rather than the change management plan get fundamentally different results. 

    Part 3 – The Fault Line: The Energy AI Architecture Decision That Outlasts Every Tool 

    This three-part series examines the three organizational failures that keep energy AI in perpetual pilot and what leaders who have moved past them did differently. Each article stands alone. The full series is the argument. 

    Part 1: Make the Decision | Part 2: Fix the Design | Part 3: Build the Architecture 

    The energy AI market offers no shortage of compelling grid modernization use cases, from predictive maintenance and load forecasting to DER orchestration and outage detection. Every one of them is real, proven, and deployable today. 

    None of them, taken individually, produces the result energy executives are actually trying to achieve. 

    What every grid modernization strategy is ultimately pointed toward is generative and agentic AI that gets smarter over time and compounds advantage across the organization. What most energy organizations are building is a collection of agentic AI tools that cannot learn from each other. The distinction between those two outcomes is the energy AI architecture gap, and no one selling AI tools has a financial incentive to close it. 

    The Fragmentation Consequence 

    Forrester’s 2026 predictions report projects that vendor fragmentation will force the majority of enterprises to compose what the firm calls agentlakes. These are composable architectures designed to manage and orchestrate fractured AI deployments that individual teams built without a shared foundation. That is not a forecast about a future problem. It describes what most energy organizations are constructing right now, one use case at a time. 

    An IDC and Amazon Web Services (AWS) study surveying more than 900 organizations found that 50 percent have deployed ten or more AI agents. Fewer than 7 percent have reached full production with even one use case. The math tells a clear story about AI scalability in energy: most organizations have more AI tools in flight than AI value to show for it. The agents are accumulating. The intelligence stays flat. 

    Gartner warns that more than 40 percent of agentic AI projects will be canceled by the end of 2027, citing unclear business value and inadequate risk controls as the primary causes. In most of these cases, the tools performed as designed. The AI architecture that would have allowed them to compound never existed. 

    What Energy AI Architecture-First Implementation Means 

    Architecture-first is not a technology preference. It is a design discipline that asks a different question before any tool is selected, any use case is prioritized, or any pilot is launched. 

    Most organizations start by asking what an AI system should do. The organizations achieving compounding AI advantage in energy start by asking what an AI system needs to know in order to get smarter every time it operates. 

    Those two starting questions lead to fundamentally different implementations. The first produces a tool. The second produces a learning system. 

    An AI tool solves a discrete problem and stays there. An AI architecture connects solutions so that each one makes the next smarter. The difference determines whether AI investment compounds into enterprise advantage or accumulates into enterprise cost. 

    In energy operations, this distinction matters because reliability planning, DER coordination, and asset investment prioritization are all continuous processes that should improve with every cycle of real operational data they touch. 

    The design discipline connects operational data across assets, decisions, and time so that every deployment makes the next one faster, smarter, and more valuable. It treats generative and agentic AI as an organizational capability that compounds with use, not a collection of tools to be procured. 

    The Four Layers Energy Organizations Skip 

    At Robots & Pencils, we work from a four-layer energy AI architecture framework that has emerged consistently across production-scale deployment research and our own engagement experience. It is the architecture that turns agentic AI into enterprise infrastructure, the kind that acts, learns, and coordinates across the organization rather than operating in isolation. Most energy organizations invest heavily in two of the four layers and skip the other two. That sequencing error is the primary reason AI teammates fail to become intelligent infrastructure. 

    The Business Context Layer is where operational data becomes institutional memory. SCADA signals, historian databases, market feeds, maintenance records, and workforce systems need not be consolidated in one place. They need to be unified in shared meaning, so that AI agents across every layer of the organization operate from the same understanding of what the data represents and what decisions it should inform. Connecting these data layers does not require opening OT environments or replacing existing control systems. The OT-IT integration approach that unifies shared meaning operates within current security boundaries and NERC CIP frameworks, making it compatible with even the most sensitive operational technology environments. 

    The Agent Execution Layer is where AI teammates perform the real work of forecasting, optimization, anomaly detection, and dispatch routing. These are agentic systems that act on data, coordinate across workflows, and improve through every operational cycle. Most energy organizations invest here first and most heavily. Without the Business Context Layer underneath, every AI teammate operates on local data with local context, unable to learn from what agents in adjacent systems are seeing or doing. The result is precisely what most energy AI programs produce: isolated wins that do not reinforce each other. 

    The Evaluation and Optimization Layer is where AI systems improve through operational feedback. Digital twins, physics-informed models, and continuous calibration convert operational experience into model intelligence. This is the layer that turns a static deployment into a learning system. It is also the layer most frequently absent from energy AI implementations, because it requires the first two layers to be functioning before it can deliver its value. 

    The Apps Layer is where utility operators interact with AI through conversational interfaces, dashboards, and decision-support tools that surface AI intelligence in human terms. This is often where energy organizations begin, because it is the most visible and the most straightforward to demonstrate. Starting here without the layers beneath it produces AI that surfaces recommendations operators cannot verify and cannot trust. 

    The DOE’s Genesis Mission, which mobilized $293 million to advance AI for grid operations, is structured specifically around the integration layer. Its primary working groups address data integration standards, shared computational infrastructure, and cross-system interoperability rather than individual use cases. The federal government’s most significant AI-for-energy investment is funding the architecture that makes use cases compound, not the use cases themselves. 

    What Compounding Looks Like at Scale 

    ERCOT created a dedicated Enterprise Data and AI organization in January 2026. Rather than establishing an AI team or center of excellence, ERCOT created an enterprise function that treats AI as organizational infrastructure rather than a departmental capability. That organizational move signals a shift from ad hoc AI experimentation to systematic, enterprise-wide architecture. ERCOT is building the foundation, not accumulating the tools. 

    The economics of getting this right at scale are significant. The Department of Energy  (DOE) projects that virtual power plant (VPP) deployment at scale could reduce overall grid costs by $10 billion per year by redirecting spending from peaker plants to participants. Separately, DOE analysis projects that VPP deployment could avoid $17 billion in annual power sector expenditure by displacing new generation build-out. VPPs already provide peaking capacity at roughly 40 to 60 percent lower cost than conventional alternatives. NREL’s Autonomous Energy Systems program is designed to manage hundreds of millions of distributed energy resources through reinforcement learning and distributed decision-making. None of these outcomes are achievable with a collection of point solutions. They require AI that can coordinate across assets, learn from aggregated behavior, and improve through every dispatch cycle. 

    The same principle holds at the operational level. When workforce scheduling data, dispatch rules, real-time outage events, and multi-channel delivery connect into a single intelligent workflow, no individual component produces the result. The value lives in the connections between layers, not in any single tool operating independently. 

    The Energy AI Architecture Question to Ask Before the Next Vendor Call 

    The energy AI market will continue producing use cases, point solutions, and vendors faster than any organization can evaluate them. That pressure does not ease. 

    For utilities operating on regulatory capital cycles of three to five years, this matters more than it does in almost any other sector. The cost of the wrong architectural decision is not one quarter. It compounds across the next rate case. 

    What energy leaders can change is the question they ask before any solution enters their environment. Not whether a tool solves a problem they have. Whether adding that capability makes the rest of their AI smarter, or adds another isolated system their organization has to manage separately forever. 

    That question is harder to answer and slower to commercialize, which is why most vendors will not help energy leaders ask it. The answer might be that their tool does not belong in your architecture yet, or that it belongs in a different layer than the one they are selling it for. 

    This design discipline is not a product category. The organizations that adopt it as a discipline rather than a procurement checklist are the ones that will look back in five years and understand why the gap between them and their competitors only widened. The tools they deployed got smarter with every cycle. The tools their competitors deployed stayed exactly where they started. 

    The right partner makes progress inevitable. Robots & Pencils builds the four-layer architecture that connects your operational data, earns operator trust, and compounds intelligence across your energy business. Request an AI Briefing and find out what AI teammates live inside your operations look like. 

    About the Author 

    Scott Young is EVP of Growth and Strategic Alliances at Robots & Pencils, where he works with energy executives to move from decision to live. Connect with Scott on LinkedIn

    Key Takeaways 

    FAQs 

    What separates an AI architecture from a collection of AI tools? 

    An architecture connects operational data across assets, decisions, and time so that every deployment makes the next one faster, smarter, and more valuable. A tool solves a discrete problem and stays there. The distinction determines whether AI investment compounds into enterprise advantage or accumulates into enterprise cost. 

    What are the four layers of energy AI architecture? 

    The four layers of the Robots & Pencils energy AI architecture framework are: 

      Most energy organizations invest in the Agent Execution and Apps layers while underinvesting in the Business Context and Evaluation layers. This is the primary reason AI wins remain isolated rather than compounding into enterprise advantage. 

      What is the difference between an AI center of excellence and an enterprise AI function for utilities? 

      A center of excellence is a capability hub that individual teams draw from on request. An enterprise AI function treats AI as infrastructure that the entire organization runs on. ERCOT’s decision to create a dedicated Enterprise Data and AI organization in January 2026 reflects the latter model. The organizational distinction matters because enterprise infrastructure receives the investment, governance, and architectural discipline that shared service centers rarely sustain at scale. 

      Why do energy AI tools fail to compound into enterprise advantage? 

      Tools fail to compound when they are deployed without the architectural foundation that would allow them to share context and learn from each other. A predictive maintenance system that cannot access outage history cannot improve its predictions based on failure patterns across the fleet. A load forecasting system that cannot connect to DER dispatch cannot refine its models based on how demand response actually performed. Compounding requires connection, and connection requires architecture. 

      How does the DOE Genesis Mission inform energy AI architecture decisions? 

      The Genesis Mission is structured around data integration standards, shared infrastructure, and cross-system interoperability rather than individual use case development. Energy leaders can interpret this as a clear signal: the federal government’s most authoritative AI-for-energy initiative concluded that integration architecture is the primary bottleneck, not model capability. Organizations building their AI strategy around individual use cases are solving a second-order problem. 

      How do we evaluate whether our current AI architecture is designed to compound? 

      Ask three questions. First: can AI agents in different parts of the organization access and act on the same operational data with the same shared meaning? Second: does each AI deployment improve in accuracy and value over time based on operational feedback, or does it perform at the same level it was trained to? Third: when a new AI use case is deployed, does it make existing systems smarter, or does it operate in isolation? If the answer to any of these is no, the architecture is not designed to compound. 

      What should energy leaders ask vendors before selecting an AI solution? 

      Ask how this solution connects to the operational data the organization already has, how it shares learning with other AI systems in the environment, and which of the four architectural layers it operates in. If a vendor cannot answer the second question, their solution is a tool rather than an architectural component. That does not make it wrong to buy, but it does mean the organization needs to understand which layer it belongs in and what foundation needs to be in place before it will deliver compounding value.