Our Ukraine team spent May 6 at the AWS Summit (EXPO XXI) in Warsaw. Here is what we saw, what surprised us, and what we are bringing back.
EXPO XXI is a short ride from the center of Warsaw, and on a May morning you can feel the conference before you see it. The queue outside was long but moving fast. Hoodies, lanyards, laptop bags. The crowd skewed more senior than you might expect at a free regional event.
Our plan was deliberate. Show up early, split the agenda, cover more ground in parallel, regroup over coffee, and most importantly, validate our approach.

Robots & Pencils team members in attendance, from left to right: Stanislav Makar, Rostyslav Volskyi, and Bohdan Popovych.
Agentic AI is the AWS Headline.
The opening keynote made one thing clear. Agentic AI is the organizing thesis for everything AWS is building in 2026.
Three names anchored the story. Kiro, the agentic IDE that got a fresh push at re:Invent 2025, featured prominently with its spec-driven development model, sequenced task generation, and agents that produce tests alongside code. Nova 2, the model powering more of the AWS AI surface, continues its region-by-region rollout. AWS Transform, their modernization platform for mainframe, VMware, and .NET workloads, framed as the agentic path into enterprise legacy systems.
Real customer stories on stage. Real numbers. Real screenshots. The European Sovereign Cloud and the EMEA AI Hub got dedicated time, which landed well with the Warsaw audience. The framing was consistent throughout: the shift from AI tools you prompt to AI agents that reason, plan, and act is underway. The question for builders is how you instrument, evaluate, and trust what those agents do.
That question got a very good answer in the next session.

The Session That Landed: AgentCore Evaluations in Production
Right timing matters at a conference, and the AgentCore deep-dive landed at exactly the right moment. AWS spent the spring pushing AgentCore Evaluations hard. It went GA on March 31, 2026, and the Warsaw session put it directly in front of European builders.
The plain-language version of what it does: a managed service that continuously monitors agent quality against real production traces, not just test suites. You are shipping agents. You need to know they work. Handing someone a scorecard you hand-rolled for each project is not a sustainable answer. This is.
The built-in evaluators cover what matters in production:
- Correctness. Did the agent get the answer right?
- Helpfulness. Was the response useful to the person asking?
- Tool selection accuracy. Did the agent pick the right tool for the step?
- Safety. Did anything in the output violate policy?
- Goal success rate. Did the multi-step task complete?
- Context relevance. Did the retrieved context match the question?
On top of those you can configure custom evaluators. LLM-as-judge with your own prompt and model, or code-based evaluators running on Lambda. The same framework handles hallucination detection and JSON schema validation without forcing two different toolchains.
The detail that made us lean forward: full OpenTelemetry compatibility. The evaluator scores flow into existing dashboards alongside session count, latency, token usage, and error rates. You can alert on agent quality the same way you alert on a CPU spike.
For anyone building agents on behalf of enterprise customers, this solves the credibility problem. “How do you know it works in production” is no longer a hand-waving moment.
The Best Conversation Happened at the Espresso Machine
One of the more useful exchanges of the day started while waiting for coffee.
AWS set up a cloud-ordered espresso bar on the expo floor. You scanned a QR code, placed your order in a small web app, and the espresso machine queued it. When the drink was ready, the screen showed your name. No line. No barista small talk. Beautifully on-brand for a cloud event, and genuinely better than the alternative.

While we waited, a conversation started with a Senior Solutions Architect at AWS. It turned into one of the most useful exchanges of the day. The topic was whether Lambda is a credible runtime for agentic workflows. The honest answer is: it depends on whether you have state.
An agent is not a request and a response. It is a long, branching workflow with LLM calls, tool invocations, and occasional human-in-the-loop steps. Lambda durable functions, which AWS shipped in late 2025 and has been shaping for agentic use cases since, address this directly. Each LLM call and each tool invocation becomes a checkpointed step inside a single Lambda. If execution times out mid-loop, the next invocation replays from the last checkpoint and skips completed steps. No Step Functions wiring. No custom state store. No DIY replay logic. The orchestration lives in the function code, in the language you already use.
The Java SDK went GA in April 2026. Durable functions are now available in sixteen additional regions.
The Best Hour of the Day: Knowledge Graphs
Two talks on knowledge graphs stood out as the strongest technical content of the summit. The first was delivered by Dmytro Romantsov, Senior SRE at Miro, on their internal AI agent built over an organizational graph. The talk was technically dense and honest: he walked through what failed before the team settled on a graph-backed architecture, what the graph actually contains, how updates flow into it, and where the approach delivers measurably better results than the pre-graph baseline.
After the session, we walked over to talk to him. Small-world moment: we share a first language, switched off English immediately, and the conversation opened up. The core thesis from both the talk and the follow-up conversation was consistent. Enterprise AI agents are only as good as the organizational knowledge they can reason over. A graph gives that knowledge structure, updateability, and query depth that flat retrieval cannot match. That is not a new idea, but watching it validated independently at Miro’s scale makes the argument more concrete.
The second strong graph talk came from an SLB engineer in DEV207, on context graphs for explainable AI agents. The framing that stuck: the difference between a state clock and an event clock. Most pipelines today reflect the current state of a system. A context graph that also captures decision events can answer “why did this happen, and in what order.” That is the kind of explainability enterprise buyers are starting to require as agents move from pilot to live.
Asking Honest Questions About AWS Transform
The AWS Transform booth was busy. The team arrived with a direct question about IBM RPG support and walked through the answer methodically with a Solutions Architect for Migration and Modernization at AWS.
The most telling moment was watching an AWS specialist type the same question into their own tool in front of us. The answer came back: yes, with limitations, followed by pages of caveats. Informative in its own way.
The bottom line is that AWS Transform is production-grade for COBOL, Java-to-JavaScript migrations, VMware modernization, and mainframe workloads. RPG support is real but not ready for complex production use cases. We left with clarity on where the tool genuinely shines and where the right path is a combination of other tools and hand-rolled pipelines. That kind of honest answer is the second-best outcome at a conference. It tells you your reasoning was sound.
The VMware migration angle, by contrast, is genuinely strong. Broadcom’s license changes are creating real urgency for customers running on VMware infrastructure. Worth flagging for relevant engagements.
The Compute Thesis: AWS is Sizing Infrastructure for Self-Managed AI
A theme ran underneath the agentic-AI headline all day: AWS is provisioning compute to match the shape of AI demand, and the demand right now for these kinds of workloads is high.
Two sessions made the same point from opposite ends of the price spectrum. Comarch walked through a real migration from x86 to AWS Graviton-based instances, with meaningful cost reductions and measured performance gains. The honest part of their talk: Graviton is not a flag flip. If you have native code, JNI bindings, or JIT-tuned hotspots, you pay for the migration before you see the savings.
On the other end of the spectrum: Meta’s agreement to deploy AWS Graviton processors at scale, starting with tens of millions of Graviton cores, announced ten days before the summit and explicitly framed around CPU-intensive agentic AI workloads — real-time reasoning, code generation, and multi-step task orchestration.
For Robots & Pencils, this opens a third option alongside Bedrock and direct provider APIs. For clients with data-residency constraints, predictable high-volume workloads, or smaller open-weight models where managed-API margins make self-managed attractive, the playbook is now well-documented and accessible. Independent benchmarks on Llama 3.1 8B have Graviton4 delivering roughly 2x the tokens per dollar of comparable x86 options for that model class.
A Practitioner’s Checklist for 2026
The session that generated the most useful signal for client-facing conversations was DEV209, delivered by Tomasz Dudek, Data and AI Team Lead at Chaos Gears and an AWS Machine Learning Hero. The premise was simple: AI has been mainstream for over three years. He has watched hundreds of Amazon Bedrock projects pass through his hands. Most near-failures trace back to a small set of repeatable mistakes.
The talk was the inverse of a vendor pitch. Here is exactly how teams stall before the first line of code. Here is what to do instead. He closed with 13 numbered tips for approaching AI projects in 2026. The final line: “Have evals, really.”
It was good to hear a practitioner at that level land on the same conclusions we have been operating on. The teams doing this work at scale are converging on the same principles, and the list mapped closely to how we already approach agent quality on client engagements. Confirmation from that angle is worth having.
The Parts That Were Just Fun
Not everything at a summit is a session worth writing home about. But a few moments, in addition to the Serverlesspresso bar, which was cool enough to warrant a second mention, stood out for the right reasons.
The AWS Drive Your Data Formula 1 simulator was exactly what it looked like: two Fanatec rigs, full wraparound LED screens, a Canada time-trial, and a results board you could compete on. The pitch underneath was real telemetry and lap analytics. The booth’s job was to draw a crowd, and it absolutely did. The team took turns.

And the Ukrainian-speaking community was well-represented at this summit. Several familiar-sounding conversations happened in unexpected corners of the expo. That part mattered.
What the Day Confirmed
The most useful thing a conference can do is sharpen your picture of where the tools are today versus where they are heading. Warsaw 2026 did that well.
Agentic AI is no longer a roadmap commitment from AWS. It is the organizing logic of everything they showed. Agent evaluation infrastructure is production-ready and instrumented the way mature engineering teams expect. The compute story has matured to a point where self-hosting is a genuine option for the right workloads, not just a theoretical one. Knowledge graphs as a foundation for enterprise AI agents are getting independent validation at scale. And the practitioners who have been doing this work longest are converging on the same principles around evaluation, quality gates, and shipping agents that are honest about what they know.
None of that surprised us. All of it was good to see confirmed.
Warsaw 2026 delivered real technical depth on agentic AI, agent evaluation, and knowledge graphs. The team went in with specific questions and came back with sharper answers, a few useful new contacts, and a strong argument for cloud-ordered coffee at the next internal engineering day.
Robots & Pencils is an AWS Advanced Tier Services Partner and AWS Pattern Partner. Request an AI Briefing today.
Written by Bohdan Popovych: Robots & Pencils Ukraine Engineering Manager, Rostyslav Volskyi: AWS Certified Solutions Architect and Amazon Web Services Developer, and Stanislav Makar: AWS Certified Solutions Architect – Professional.







