Agent-swarm leap: We built a UI for our agents. They used it more than we did.

tl;dr: Slow is smooth, and smooth is fast. Agent-swarm didn’t make us faster by coding more. They made us faster by letting us plan instead of firefight.

Feb 02, 2026

When we built our first agent swarm, we expected to ship faster. Instead, we spent weeks monitoring it like anxious parents. Here’s what changed.

Late November, after a good experience with our cc-plugin [1], we gained enough confidence to believe we would have a positive ROI by building our own cloud coders. We tried multiple existing solutions, think conductor, superpowers, etc; and nothing was remotely up to the standards we needed, so we decided to give it a try.

The first few weeks we seldom used it. It was a fun toy. We were pretty comfortable navigating locally our 5 claude-code tabs, and having to monitor progress in a foreign behavior made it so that it didn’t really feel like an autonomous developer. It felt like training an intern. A month in, though, the first behavior shift happened. We crystallized 2 key learnings:

(1) We are used to working with peers in very integrated ecosystems (github, slack, documents, etc.) and,
(2) Lack of context was the most tedious part of the process.

We were the bottleneck, and that realization is what allowed us to 8x the productivity of our agent swarm in the last 8 weeks.

First: You are not the user, they are

It’s so easy to create a webpage nowadays, that everything you do feels like deserving of their own web footprint. That was our first mistake.

We created a UX meant for us to chat with our swarm, it is quite wonderful and we rarely used it. In fact, I don’t think I ever checked it more than a couple of times a week. Not since the start.

In a world where we are already communicating over so many different apps (whatsapp, notion, linear, jira, slack, imessage, email, github, etc) adding a new one seems so tedious, that in foresight it is obvious we wouldn’t put with that friction. Our first learning was clear:

Agents used our chat more than ourselves.

The first obvious moment was when we were commuting and we wouldn’t find out our agents were stuck. The second one was the extreme frustration to see that an agent had spent the night waiting for a simple “yes/no” answer. That made it obvious we needed to introduce 2 changes:

Slack app to interact with our swarm,
desplega-bot to interact directly with PRs.
Funny enough, we had to actually create a github user for our bot, given that github doesn’t allow you to tag apps, otherwise. At least not reliably.

Now our agents are part of every PR, and every conversation. We went from monitoring our agents as you do with software to actually having conversations as we did with our peers. The spike in interactions was immediate, from a couple of times a week, to tens of times a day, as they interacted with our product. Our relationship changed to be much more hands-off, it’s just that they were doing more…

Second: Give them alert feeds

Context, context, context, context. If you use an observability platform, you are used to copy long stack traces and pasting them in your coding LLM. Alas, you are also used to seeing lots of noise, and disregarding half of what you see.

Guess what? LLMs are better than us at understanding those messages, but lack the context to understand the importance of them. Hence our second lesson:

Give them access to alerts, and invest on pruning them constantly.

Specifically, gave them access to Sentry via sentry-cli (agents are great at CLIs, but that’s for another article)… and we had to turn it off. The first couple of days were a disaster, we had alerts that were expected, even positive, and agents proposing different ways to break our system. The second week was even worse, agents seem to reply to random alerts, ignore others, and misinterpret everything in that channel. It wasn’t the panacea people promised, at least not initially. Instead, we took the time during the following weeks to actually tag manually the agent on those alerts we wanted them to intervene. For the negligible ones, we also ask them to mute/ignore/change them.

17 merged PRs from desplega-bot in the last few weeks, incl. fixes, docs & refactors that would've never happened otherwise.

With a self-developed system, the amount of meta-work you could do in every flow is unparalleled, and grasping such a concept early on is absolutely key: you should always ask it to improve itself, and it will.

Week over week, the system continues to finetune itself, with PRs that are either simple fixes, or “correcting” alerts. In a few cases, they would span major refactorings. Seems quite straight forward right?

The next major step: what happens if we give them objectives on performance metrics?

Third: Make them compete

When resources are scarce, you pick one path and commit. That’s how most of us learned to build. But agent swarms break that constraint, we have much more resources than before, and we have to update our instincts.

Microsoft, Google, or Apple have been known for having unlimited resources. As a result, they have multiple teams working on the same product, without even knowing. For example, how many social apps or messaging apps did Google create? What about Apple having multiple competing teams developing the iphone, to then pick the best one at the end? Now you can do the same.

Albeit you probably don’t have those billions of users, you can opt for a similar approach. Now your research and planning costs, and even implementation costs, have reduced greatly, so creating 5 versions of your next feature shouldn’t be that stressful. It’s a world where POCs could succeed. That’s our 3rd learning:

Don’t do it once, do it thrice

Let me give you an example: You want to unify your web table component across your codebase. Before, you’d ask your team to research all available tools, pick one and then use that one. At best, you’ll ask them to try and implement 2 of those and compare. Now, you can do something completely different, you can actually migrate all your tables (or a good number of them) and then actually see it in production under a feature flag. Is it really smoother? Is it really optimized for our QPS?

It doesn’t mean you are doing it for everything, it just means that now you can have better informed decisions at a fraction of the cost.

The-one-thing

Our leaps didn’t come from coding every imaginable option, they came from realizations after tinkering with AI for days, weeks and months. What initially was counterintuitive, becomes obvious once you’ve done enough.

Give yourself time to plan, instead of react.

We found out we were creating teamplayers, instead of a new devex, and that actually completely shifted our approach.

It was a good Monday catch-up with Taras.

Discussion about this post

Ready for more?