In part 1 of this 3 part blog series about the Team Topologies book, I summarised a large set of general things that can help make teams successful. In this part I will get to actual team design i.e. types of team and how they should interact..
The Four Team Types
Based on their extensive research, the authors present 4 types of team and make an important assertion: to be an effective organisation you only need a mixture of teams conforming to these 4 types.
Personally, I really identified with the 4 types, but even if you have different opinions, it is still very useful to be able to reference and extend / update the common set of types described (in such great detail) in the book.
The 4 types are as follows.
Stream Aligned Teams
This is the name the authors coined for an end-to-end team that owns the full value stream for delivering changes to software products or services AND operating them. This is the stereotypical modern and popular “you build it, you run it team”.
For Stream Aligned teams to be able to do their job, they need to be comprised of team members that collectively have all of the different skills they need e.g. business knowledge, architecture, development and testing skills, UX, security and compliance knowledge, monitoring and metrics skills etc. This enables them to play nicely with the requirement of minimising queues and handoffs to other teams (especially in comparison to teams comprised of people performing singular functions e.g. testing).
The expectation is that by teams owning their whole value stream including the performance of the system in production, they can optimise for rapid small batch size changes, and reap all of the expected benefits around both agility and safety. The hope is also that this end to end scope might help teams achieve “autonomy, mastery and purpose” (things which Daniel Pink highlights as most important to knowledge workers).
The amount of cognitive load required of them is a factor of a few things:
- If they have enough of the requisite skills in their team, intrinsic cognitive load will be manageable.
- If the modes of interaction with other teams are clean and efficient, their extrinsic cognitive load shouldn’t be too high. They can also minimise this within their team through automation.
- If they were formed using fracture planes effectively enough that they can keep their scope (and therefore their germane cognitive load) to a manageable amount.
Overall this seems like a very compelling team type and indeed the book recommends that the majority of teams in organisations have this form.
The book goes into a lot more detail about how to make a Stream Aligned Team effective which I highly recommend reading.
If the book only included Stream Aligned Teams, I would have considered it a cynical attempt to document fashionable ideas and ignore the factors that have led many people to have other team types. Fortunately, instead they did a great job at considering the whole picture via 3 other types (plus bonus type SRE – read on!)
In a tech start-up that begins with just one Stream Aligned Team owning the full stack and software lifecycle end-to-end, at some point they will face into Dunbar’s Number and cognitive overload. It will be time to look for a fracture planes to enable splitting into at least two teams. Separating the platform from the application is a very successful and well-established pattern. At this point we could potentially have one Stream Aligned Team working on the application and one on the platform.
Let’s say the business and demand for application complexity continues to increase and the application team splits into two teams focussed on different application value streams. We’re now in a situation where they both most likely re-use the same platform team and we get the benefit of re-using it. This could of course repeat many times. The authors recognised the nature of running a platform team (especially in terms of the type of coupling to other teams and effective modes of communication) differed enough to create a new team type and this they called Platform Team.
Platform Teams create and operate something re-usable for meeting one or more application hosting requirements. These could be running platform applications like Kubernetes, a Content Management System as a service, IaaS, or even teams wrapping third party as-a-service services like a Relational Database Service. They may also be layering on additional features valuable to the consumers of the platform such as security and compliance scanning.
But there are potential traps with platform teams. If the platform is too opinionated and worse usage is also mandated within an organisation the impedance mismatch may be more harm than good for consuming applications. If a platform isn’t observable and applications are deployed into a black box, the application teams will be disconnected from enough detail about how their application is performing in production and disempowered to make a difference to it. The book goes into a lot of detail about how to avoid creating bad platform teams and strongly makes an argument for keeping platforms as thin as possible (which it calls minimum viable platform).
Personally, I think the dynamic that occurs when consuming a platform from a third party is very powerful for a number of reasons:
- The platform provider is under commercial pressure to create a platform good enough that consumers pay to use it.
- The platform provider has to be able to deliver the service at a cost point below the total revenue it’s users will give it (so it will be efficient and chose it’s offered services carefully)
If organisations can at least try to think like that (with or without internal use of “wooden dollars”) I think they will create an effective dynamic. The book mentions that Don Reinertsen recommends internal pricing as a mechanism for avoiding platform consumers demanding things they don’t need.
The next type of team is designed to serve other teams to help them improve. I think it is great that the book acknowledges and explores these as they are very common and often a good thing. Enabling teams should contain experienced people excited about finding ways to help other teams improve. To some extent they can be thought of as technical consultants with a remit for helping drive improvement.
Most important is how well Enabling teams engage with other teams. There are various traps such a team can fall into and must avoid:
- Becoming an ivory tower defining processes, policies, perhaps even technical ‘decisions’ and inflicting them upon the teams that they are supposed to be helping.
- Generally being disruptive by causing things like interruptions, context switching, cognitive load, and communication overhead – especially if this cost outweighs the benefit.
As with all other types, the book provides detailed expected behaviours and success criteria for enabling teams such as understanding the needs of the teams they support and taking an experimental approach to meeting these. It’s also important that enabling teams bring in a wider perspective perhaps of ideas about technology advances and industry technology and process trends. Enabling teams may be long lived but their engagement with the teams they are supporting should probably not be permanent.
Finally, the book defines a 4th type of team called a complicated-subsystem team. Essentially these teams own the development and maintenance of a complicated subcomponent that is probably consumed as code or a binary, rather than at runtime as a service over a network. The concept is that the component requires specialist knowledge to build and change and that can be done most effectively by a dedicated team without the cognitive load required to consume and deploy their component.
Other types of teams: SRE teams
I did feel the book acknowledged some other team types outside of the main 4, for example, SRE teams.
Without getting into too much detail, Site Reliability Engineering (SRE) is an approach to Operations that Google developed and started sharing with the world a couple of years ago. The thing that is most interesting about it from a team design perspective is that an SRE team is a traditional separate operations team. In this regard it doesn’t really fit the 4 topologies above. I think there are a couple of reasons the SRE model is successful:
- At least at Google, the default mode of operation is Stream Aligned teams. An SRE team will only operate an application if it is demonstrably reliable and doesn’t require a lot of manual effort to operate.
- It promotes use of some conceptually simple but effective metrics to ensure the application meets the standards for an SRE to operate it.
The book says that if your systems are reliable and operable enough you can use SRE teams. Doing so will of course reduce some cognitive load on the team by having to pay less attention to the demands of an application in production. This is actually very interesting because in some ways a Stream Aligned team handing over applications to an SRE team to operate is very similar to a traditional Dev and Ops team.
So, the message in a way becomes if your processes are mature enough and your engineering good enough, traditional ways will suffice, BUT the problem is they probably aren’t and instead you need to use Stream Aligned teams until you get there. I think this leads to an alternative option for organisations – directly implement the measurement of SRE and focus on quality until you can make your existing separate Dev and Ops perform well enough. I even suggest this justifies a first class fifth team type.
Other types of teams: Service Experience
The book also acknowledges that many business models include call centre teams and service desk teams who engage with customers directly. These teams are proposed to be kept separate from the team types above but also to ensure information about the operations of the live system makes it back to the Stream Aligned teams. It also talks about the concept of grouping and tightly aligning Service Experience Teams with Stream Aligned teams. The Service Experience Teams are named this to emphasise their customer orientation.
A key success criterion is the relationship to Stream Aligned teams which must be close and ideally 1 to 1. I think this is an excellent point and that in many organisations it’s as much the 1 to many (overly shared) relationship between teams that should be more closely aligned that cause as many problems as the team boundaries. For example, if an Application support team is spread across too many applications whilst it may achieve some efficiencies of scale, the overall service delivered to each supported team will be far less effective than if the team was subdivided into smaller more closely aligned teams.
If you spotted other team types in the book that were acknowledged as acceptable – let me know.
The Three Modes of Team Interaction
The book creates 3 interaction modes and recommends which team types should use each one. By interactions the book means how and when teams communicate and collaborate with each other. Central to this is the idea that communication is expensive and erodes team structures, so should be used wisely.
The first interaction mode the book defines is called Collaboration. This is a multi-directional regular and close interaction. It is fast if not often real time and therefore responsive and agile. It is the mode than enables teams to work most closely together and is useful when there is a high degree of uncertainty or change and co-evolution is needed.
The cost of this mode is higher communication overhead and increased cognitive load – because teams will need to know more about the other teams.
Stream Aligned Teams will be the most likely to use this, probably with other Stream Aligned teams and especially earlier in the life of a particular product when uncertainty is highest, and boundaries are evolving.
The second mode is called X-as-a-Service. A team wishing to use this interaction mode needs to consciously design and optimise it so that it serves both them and the other parties as effectively as possible.
Making this mode effective entails abstracting unnecessary detail from other teams and making the interface discoverable, self-documented and possibly an API.
The down side of this mode is that if it isn’t implemented effectively it may reduce the flow of other teams. It also needs to stay relatively static in order to avoid consuming teams having to constantly relearn how to integrate.
As you’ve probably guessed this is a great model for a Platform Team to adopt – especially if they are highly re-used by many consumers. It can be especially effective when platforms are well established and do not require rapid change or co-evolution of the API.
The final mode is called Facilitating. This is the best mode for Enabling teams and describes how enabling teams can be helpful without being over demanding.
All of the above modes are described in much more detail than this and have very actionable advice
The book has a section on static topologies i.e. how teams may interact at a point in time. However, it stresses the importance of sensing and adapting. As the maturity of a team changes collaboration modes and even team types may change or subdivide.
The Team Topologies book uses SRE as a good example of teams moving from Stream Aligned, to a function containing reliability specialists (who may or may not call themselves SREs) operating as an enabling team, to an SRE team acting as a separate Ops team and then back. Obviously, there is a tension with making these changes to keep teams effective with the costs of changing teams and resetting back to Storming (as per Tuckman).
The book even offers advice for helping existing functional or traditional teams transition into the new model for example: infrastructure teams to Platform teams, support teams to Stream Aligned teams.
In the final part of this series, I will share some thoughts about how to use this information.