If you're going to build Tony Stark's AI assistant, you know, Jarvis - Wake up, daddy's home. - Welcome home, Sir. - You're gonna need more than one agent to get that job done. Agent-to-Agent or A2A is a great thing to have in your back pocket. Most of us who are doing multi-agent work these days, we're talking about multiple instances of Claude Code hammering away on a code base. Maybe a few projects, maybe one, but how are they coordinated? Through the file system, looking at source files, maybe markdown, task lists, that kind of thing. It's architecturally not that complicated. You don't have big authentication questions or messaging or anything like that. And it's super useful, very exciting. I personally love it, but that's not how you build Jarvis. That's not how you're going to build multi-agent systems in the enterprise. Let's take a look at how you might do that. Now, in my case, for my smart home, we're going to start with a speech to text system that's going to listen for a wake word, like, "Hey Jarvis," or something like that, and then take whatever comes from that and submit that to an agent. Now, this agent serves as an orchestrator or what is sometimes called a foundation agent. Now, that is going to have the basic agentic loop of the system. So that's going to receive the prompt from the human user and the speech to text interface. It's going to get a list of tools or agents that it's going to call to get its work done. It's going to do the things specified by those tools, actually make the tool calls, not just figure them out. And then it's going to loop, something like that. We could elaborate on that a little bit, but that's the basic agentic flow. Now, because we don't want this foundation agent to have to know everything, like we don't know what Jarvis' future capabilities are going to be, and we don't want to make this loop some terrible rabbit warren of if-then statements, we're going to use composition. Composition is a very powerful pattern to apply in situations like this. That means we're going to make other agents. So let's say I have a calendaring agent and maybe I have an agent to control the lights. So we break it down into individual agents that can do their own kind of specialized things. I could say, "Hey Jarvis, find some time "where Rachel and I can get together with Tony and Pepper." And this main orchestrating agent is going to delegate that to the calendaring agent, which has access to the household calendar, has access to my calendar, my wife's calendar, can find some evenings that might work out. And maybe even if it's got access to contact information, could text or email that to our friends to see when we can get together. That's all possible to automate. The lights agent, of course, trivially it's going to have access to the smart home light stuff. I mean, that stuff's been possible for more than a decade now with voice control. But maybe it also has access to say, presence detectors in rooms. So I could say, "Hey Jarvis, turn off all the lights "if there's nobody in the room." And it'll know, "Okay, well this room is occupied, "we'll keep these on, these five rooms aren't, "we'll turn those off." It's a great automation of a core dad task. Now what we're going to do is have each one of these agents publish a description of themselves called the agent card. That agent card is going to have a few things in it. The agent has, well, a name, that seems nice, right? The agent has a function, like what does this agent do? That's a text description of the purpose of the agent. I control the lights and tell you who's in a room. I manage calendaring and scheduling for the household. That function is going to find its way back into the prompt. Let's not forget that the foundation agent, the orchestrator is going to submit to its large language model, whatever model it's using, that function is going to go in there and that's going to help it figure out which agent to delegate to in these steps here. So that's just a text description of what the agent does. It's going to have an HTTP endpoint address where the calling agent can submit requests when we're actually going through the flow and any special skills the agent has, not just a description of the functions, but specifically what can I do? Can I send requests? Can I turn off the lights? Can I wield hammers forged in the hearts of dying stars? Whatever those skills are that these agents have, those are listed there. And finally, of course, how and whether it wants to do authentication. So that agent cards, a JSON document at a well-known URL at the domain of the agent itself. All this foundation agent needs to know is the address of the agents that compose it and compose this whole multi-agent scheme. It can get the agent cards and put the right parts of the agent card into the context window. And so its model can help it figure out when it needs to delegate. Super powerful, very easy to discover. And in the future, we can build even more agents, simply add those in and our foundation agent remains pretty much unchanged. Now, how do interactions happen across this link? Well, let's introduce some concepts here. There's what we're gonna call the client agent over here. I'm calling that the foundation agent or the orchestrator agent, and then the remote agent. So these are the two agents in the interaction. Of course, we can have multiple remote agents and importantly, remote agents themselves may also act as client agents. Remember, we're using composition here. We don't need to know. We're calling that remote agent, asking it a question. It might delegate to other agents. That's kind of not our business, but this whole thing can grow as a network. Now, everything that happens between them is JSON RPC messages over HTTPS. That's just the basic structure. And there are three kinds of interactions that can happen. One is where I make a request and I have a synchronous response. It's over and done. The remote agent is able to answer me. This is not a thing that's gonna run on for more time. We're done. Nice case there, right? Another is where I have a request and this remote agent knows is gonna take a while. So the replies will be streamed using server sent events. The third category is where we know we have a request and we are gonna need an asynchronous notification later, which will happen via a webhook. And that webhook URL is submitted as a part of the request when the client is asking the remote agent a question. So the remote agent knows where to go for the webhook when it needs to send its notification. In the case of the long running requests that aren't just over and done, the remote agent is gonna kick off what's called a task and it's gonna send back a task ID in the response. So the client now is able to keep that as a piece of context and a future request say, "Hey, by the way, I know this is associated with this task. We get some chunks back on this long streaming thing or I've sent a request and asked for an asynchronous notification. I can still pull against that task ID and see how things are going right now and get intermediate results." Now, all of these interactions back and forth here all take the form of messages. That's what we call them. Each one of them has a role and one or more parts. It's very simple. So it's a little bundle of data that can be plain raw text data, it can be raw binary, it can be a URL pointing to some other resource that's associated with the request or the response, can also be a payload of additional JSON data. So fairly simple structure. You've got these messages going back and forth over JSON RPC, over HTTPS in these three broad categories of things. You've got tasks for long running operations on the remote side and you're able to track those task IDs on the client side. I mentioned agents get to describe how they want to do authentication. The simple answer there is that A2A basically delegates that whole topic to conventional web standards. So that's gonna look a lot like OAuth. And in the case like Jarvis, where I can't have a pop-up that says, hey, I want to authenticate against my Google account so I can see my calendar, I will have had to create a token. So there will be some secret management there that will be then a bearer token in the authentication header in this HTTP request. But that's all the usual suspects there. It's a fairly well-worn path. If you don't know how to do it, it's okay, Claude does. So you'll be able to get some help with that. What language can I use to write Jarvis or can you use to write your own multi-agent system at work? Well, the usual suspects are supported in the current language bindings. There's Java, Python, Go, JavaScript and languages of the.NET platform. So pretty inclusive list there. Like I said, the experience most of us have with agents and multi-agent systems right now are Claude writing a bunch of code for us, even a few agents working together on that. And that is so powerful and I think such an important part of the unfolding nature of software engineering and what it means to build systems. But that's not the whole story. I think agentic microservices, systems like this, I'm imagining a smart home assistant, think how this applies to problems at work in an actual business, answering requests from internal people, customers, interacting with various systems. This stuff's pretty powerful and agent to agent could well be an important part of that network of agents that you're gonna build next.
Most multi-agent work today is pretty simple: instances of Claude Code hammering away on a codebase and coordinating through the file system on a laptop. If you want to build multi-agent systems in the enterprise though, you'll have to deal with more advanced things like authentication, complex messaging, and agentic microservices. In this lightboard, Tim Berglund imagines what it might look like if Tony Stark had used Agent2Agent (A2A) to build Jarvis, his AI home assistant. Tim walks through what it takes to wire together distributed agents at enterprise scale, from agent cards to interaction patterns (synchronous, streaming, and async webhooks), tasks, messages, and how authentication works. Tim's related videos: Agents Skills or MCP?: https://youtu.be/pvxNcQTcIy4 | Overview of MCP: https://youtu.be/FLpS7OfD5-s 0:47 Foundation agent + composition 3:11 Agent cards: discovery and self-description 5:16 Client agents, remote agents, and interaction patterns 6:45 Tasks, messages, and long-running operations 8:01 Authentication via standard web protocols 8:42 Language support (Java, Python, Go, JS, .NET) 9:01 From coding agents to agentic microservices You can try Confluent Intelligence at https://www.confluent.io/product/confluent-intelligence Promo code: CONTEXTENG LEARN MORE ► Agent Skills vs MCP Lightboard: https://youtu.be/pvxNcQTcIy4 ► MCP Lightboard: https://youtu.be/FLpS7OfD5-s ► Context Engineering Lightboard: https://youtu.be/Cs7QiSi8KLY ► Confluent Developer: https://developer.confluent.io CONNECT Subscribe, if you dare: https://www.youtube.com/@ConfluentDeveloper?sub_confirmation=1 Community Slack: https://confluentcommunity.slack.com X: https://x.com/confluentinc Linkedin: https://www.linkedin.com/company/confluent GitHub: https://github.com/confluentinc Site: https://developer.confluent.io ABOUT CONFLUENT DEVELOPER Confluent Developer provides comprehensive resources for developers looking to learn about Apache Kafka®, Apache Flink®, Confluent Cloud, Confluent Platform, and any other technology related to the broader Data Streaming Platform. Content on Confluent Developer includes courses, getting started guides, topical deep-dives, patterns, tutorials, and listings of community events. Learn more at https://developer.confluent.io. #a2a #agentskills #claudecode