Don’t Leave Quality to Chance: Step up your Software Development Life Cycle
A dirty little secret when using AI for coding: you are now in a scaleup.
tl;dr: You have 10x productivity from your SWE team, but your quality and efficiency have dropped. This is because you have effectively scaled your team overnight; what was previously a team of 10 now feels like a team of 50. We discuss a few reasons for this, the theory behind it, and how teams have overcome these challenges before… because they have done it in the past!
A few times in my career, I've had the opportunity to be a direct manager of 10+ individuals. My record is actually ~25, and I’m not proud of it. Usually, what led to those circumstances was either me being a late hire after years of poor management practices or a major re-org that resulted in me leading a new organization. I don’t recommend it, however, I would do it again if needed.
This is the art of managing the unmanageable, and in each case, I did two things from day one that proved incredibly rewarding:
Put safeguard processes in place that allowed me to shape the work in the organization (over the years, I started calling that ‘defining principles’).
Make myself redundant by spotting the talent that would later lead the organization.
To be clear, I have tried many other approaches throughout my career, and I’ve learned quickly that these two were the only things that mattered. I’ve done this for scaling teams in Growth, Product, UX, HR, and Tech many times, and it’s a key area I mentor on. That’s because it used to be a challenge present only in larger teams; with AI, this is becoming a challenge in Tech even with a team of two people.
So let’s talk about those Safeguard Processes and leave the People Management topic for a future post.
Why is this a problem now?
Theories like the Ringelmann effect, Amazon’s Two-Pizza Team rule, and Brooks's Law were all based on humans as the main source of labor. That means humans were the ones producing the work, and very few were leading that work. To be specific:
Free-riding is the key concept in the Ringelmann effect, and it’s the result of others being so distracted that they cannot see your lack of contribution. If others are 10x as productive, it means you can ‘vanish’ much earlier on.
Single-threaded leaders may not be affordable or needed in a world where 90% of the work is defining a direction, and the rest just happens. Maybe you can save on that second pizza.
Brooks's Law focuses on exponential communication as a result of a growing team. However, you can see how exponential communication is also necessary as a result of exponential productivity. There are more things to communicate than before, and humans are particularly slow when it comes to relaying information to each other.
As a result, now more than ever, CEOs, CTOs, Heads of QA, and VPs of organizations of any size are concerned about the same thing,
How to manage larger (augmented) teams?
The good news is that the solution has been around for a while, and it may be easier to implement nowadays, given that your human team is smaller. As I mentioned above, Leading by Principles was popularized by individuals like Andrew Grove, Bill Campbell, and Ben Horowitz as the only way to introduce sustained change in an organization. Tim Howes referred to it as “Make Peace with Repeating Yourself (Ad Nauseam).” I prefer pointing people to my writing, as suggested by Ray Dalio’s Principles or Frederick Brooks’s The Mythical Man-Month.
Here I propose a high-level framework to create a high-velocity, self-correcting system focused on three short principles that are core to bring your SDLC and organization to the next level:
Writing, and reading, before doing
A single Mission Control Center
Before getting into the weeds, I want to stress one thing: This can be used beyond Tech, as this is not a Tech problem but an organizational challenge. Having said that, below I’m going to go into the specifics of how to implement this in Tech organizations. Sorry, not sorry!
First: Writing, and reading, before doing.
Suddenly, your output is not the output of a single person anymore; it’s the output of a team. That means your solutions could introduce larger mistakes, faster than before. You could, let’s say, create a software that spends $1,000/hr in resources in an afternoon, and you wouldn’t notice. Or you could develop what you think is the right feature very quickly, overnight, and expose all the PII data from your customers without notice.
Yes, your lead is going to review the code, but sometimes the issue isn't a distraction; it’s simply a lack of knowledge. The feedback from your CTO or VP will eventually come, and maybe too late. Other times, your team will end up creating an overcomplex solution, e.g. implementing an NP-hard algorithm in the front-end and making your app unusable without anyone noticing. After all, part of the trade is bringing simple solutions to complex problems.
There’s more, though! Diverging is also a major risk. One of the biggest challenges we’ve seen is learning from each other and Breaking Silos. If the person sitting next to you has discovered how to be 2x more efficient in some aspect, you want to learn it, fast! This is the way to accelerate the whole organization long-term, and nowadays that tweaking could give you a key competitive advantage.
To get started on this, start easy, and enforce it:
Add 30 minutes a day for a design review (or PRD review).
Force constraints:
A document size limit of two pages.
The document must be shared 24 hours in advance.
Only comments added to the document at least 2 hours before the review will be discussed.
The output is a clear "yay" or "nay."
Shift Left!
Invite everyone to the meeting: QA, UX, Tech, Product, the CEO if they want to!The magic rule: Speak now or forever hold your peace.
Discussions could not prolong, you need to move fast, so a decision in that call needs to be held for the next few months at least.
You will get one thing out of this process: a daily cadence for your team. You’ll see how you can remove many ad-hoc meetings if you handle this properly. Above all, the number of ‘surprises’ from someone wanting to change something late in the game should be close to zero, if you are using the magic rule properly.
Second: Mission Control Center
If you are thinking of KPIs and OKRs to track weekly, that’s great… but it’s not what I’m talking about.
Four years ago, having a well-integrated observability dashboard was optional for an early-stage team. You could assume system usage was limited and the entropy added by people changing your system was small, meaning every change could eventually be reviewed by the lead. I remember being on a team of four ICs at Google early in my career, and my manager reviewed every single PR. That worked because we produced like a 4-to-1 team; it won’t work if you produce like a 20-to-1 team.
The Mission Control Center, then, becomes vital to catch your team's “mistakes” before your users do. If your k8s nodes are suddenly restarting every five seconds, or your latest release has no network connectivity, you need to know, fast. And this is going to happen because LLMs make mistakes! And they make those mistakes faster than you can review them… hopefully.
As a leader, enforcing the idea of a single, centralized, up-to-date mission control center will allow you to:
React to issues before they become postmortems and churn problems.
Remove all data-brawls that may appear when something doesn’t work as expected.
Did you notice how this links to the First point?
It’s in writing & reading where you define your operating principles. ;)
In this case, that’s where you enforce the dashboard update, if needed.
Third: Eliminating the Toil!
This is the most difficult and entertaining part of the job, and it is easy to remember: don’t do the same thing twice!
The most challenging part about toil is the mental space it consumes. If you are thinking all day about that painful five-minute task you need to do, the burden isn't just those five minutes; it's the weight you carried throughout the day. I like to think of it as the source of Parkinson’s Law.
In my experience, the biggest pain of asking people to eliminate toil is that the activity itself becomes toil, which is our favorite thing to procrastinate on. I suggest you follow three strategies to get the most out of it:
Buy, buy, buy
Toil is, by definition, context to your business. It’s that thing you need to do, and if you do it well, nobody cares. On top of that, having toil is detrimental to your team's motivation. Repetitive tasks that don’t help them advance in their self-actualization end up being a source of burnout for talented individuals.
My advice, if you can, is to buy solutions. Why?You suddenly have an expert team
They are improving that area of your software, it’s their core value. Think of how you don’t develop your own CRM, HRIS, ERP, or ATS, even though you could. Apply that thinking to each area of your team, e.g., QA automation, Observability, or performance measurement.You can focus on your business
You don’t have to track progress, manage, or invest time in these areas, and that’s your competitive advantage now. You do not need to understand how to build the best platform for logging, testing, or extracting data from Bitbucket to measure productivity.
Mix it up
Some things you won’t be able to buy, like creating that release dashboard or defining your set of test cases. For those, make them a standard part of a larger project. That way, it’s just eating the whole cake; even if you only like the frosting, at least you get some.Rotations! Rotations! Rotations!
If you have build-cop or release rotations, a good strategy is to use those weeks for the person on rotation to work on minimizing toil. You should take it one step forward and actually reward toil minimization. A good strategy is to give the person a ‘skip’ in the rotation queue or even a day off if the time saving was significant.
Finally, when focused on toil, always remember to ask: Is It Worth The Time?
the-one-thing: Never be the bottleneck
It’s easy to decide to implement these mechanisms and delay everything else until they are ready. In fact, it’s even easier to say, ‘we are not delivering on time because we need to do <toil>’; that makes you [Tech, QA, Product, etc.] the bottleneck. The other little dirty secret is: If you are the bottleneck, if your organization is the bottleneck, you are not doing a good job.
My advice is that whenever you implement this process, you agree on a timeline, hold yourself accountable to it, and ensure everything is done so that your team is not the bottleneck. Internalize the following:
In the eyes of a founder, it’s better to have tried and failed than to have never tried and saved resources, even with a 100-to-1 chance. That is how they’ve been successful at building an amazing startup.