Team Topologies Book Summary – Part 3 of 3: Taking Action

In part 1 of this 3 part blog series about the Team Topologies book, I summarised a large set of general things that can help make teams successful. In part 2 I covered the books ideas on team design and team interaction modes.  What an epic so far, and this has just been my summary of parts of the book with much less detail and specific advice.  In this final blog of the series I will leave you with some things to consider doing with all of this excellent information.

From the general guidance for effective teams, consider:

  • Are teams too large?
  • Do they have the right working environment?
  • Do they treat other team members well enough?
  • Are teams being allowed to stay together long enough?

Using the concept of Cognitive load, consider:

  • Do teams have the right skills to handle the intrinsic cognitive load expected upon them?
  • Do teams use enough automation to minimise extrinsic cognitive load?
  • Do teams have effective interaction modes to other teams to minimise extrinsic cognitive load?
  • Do teams have a manageable amount of scope (germane cognitive load) or should they consider exploiting a fracture plain to divide up?
  • Could SRE processes be adopted to keep cognitive load down by continuing to divide it between a Dev and Ops (SRE) team, but also to improve overall alignment and resilience and agility?

The terminology for team types and interaction types in the book is extremely helpful when thinking about the teams in your organisation.

Using the topologies, consider:

  • Which types do our existing teams align to?
  • If our teams do align to a type, are they following the recipes for success in the book or falling into the anti-patterns?
  • If our teams do not align to a type in the book, should they be altered so that they do?  Or do we really want to customise the types for our needs?
  • Are our teams effectively using the interaction types recommended in the book, or if not, are they communicating too much, and generating too much cognitive load?

If existing team structures do not fit the advice of this book and you see room for improvement, what factors have influenced them to become as they are today (e.g. budgeting / finance models)?  Are these things easily surmountable?

I hope this has all been useful and you are now inspired to read the full book, to get much deeper into this vital topic, and to start applying it to your own organisations!

I think the authors Matthew and Manuel did an awesome job and would like to thank them again for writing it.

Team Topologies Book Summary – Part 2 of 3: Topologies and Interaction Modes

In part 1 of this 3 part blog series about the Team Topologies book, I summarised a large set of general things that can help make teams successful.  In this part I will get to actual team design i.e. types of team and how they should interact..

The Four Team Types

Based on their extensive research, the authors present 4 types of team and make an important assertion: to be an effective organisation you only need a mixture of teams conforming to these 4 types.

Personally, I really identified with the 4 types, but even if you have different opinions, it is still very useful to be able to reference and extend / update the common set of types described (in such great detail) in the book.

The 4 types are as follows.

Stream Aligned Teams

This is the name the authors coined for an end-to-end team that owns the full value stream for delivering changes to software products or services AND operating them.  This is the stereotypical modern and popular “you build it, you run it team”.

For Stream Aligned teams to be able to do their job, they need to be comprised of team members that collectively have all of the different skills they need e.g. business knowledge, architecture, development and testing skills, UX, security and compliance knowledge, monitoring and metrics skills etc.  This enables them to play nicely with the requirement of minimising queues and handoffs to other teams (especially in comparison to teams comprised of people performing singular functions e.g. testing).

The expectation is that by teams owning their whole value stream including the performance of the system in production, they can optimise for rapid small batch size changes, and reap all of the expected benefits around both agility and safety.  The hope is also that this end to end scope might help teams achieve “autonomy, mastery and purpose” (things which Daniel Pink highlights as most important to knowledge workers).

The amount of cognitive load required of them is a factor of a few things:

  • If they have enough of the requisite skills in their team, intrinsic cognitive load will be manageable.
  • If the modes of interaction with other teams are clean and efficient, their extrinsic cognitive load shouldn’t be too high.  They can also minimise this within their team through automation.
  • If they were formed using fracture planes effectively enough that they can keep their scope (and therefore their germane cognitive load) to a manageable amount.

Overall this seems like a very compelling team type and indeed the book recommends that the majority of teams in organisations have this form.

The book goes into a lot more detail about how to make a Stream Aligned Team effective which I highly recommend reading.

If the book only included Stream Aligned Teams, I would have considered it a cynical attempt to document fashionable ideas and ignore the factors that have led many people to have other team types.  Fortunately, instead they did a great job at considering the whole picture via 3 other types (plus bonus type SRE – read on!)

Platform Teams

In a tech start-up that begins with just one Stream Aligned Team owning the full stack and software lifecycle end-to-end, at some point they will face into Dunbar’s Number and cognitive overload.  It will be time to look for a fracture planes to enable splitting into at least two teams.  Separating the platform from the application is a very successful and well-established pattern.  At this point we could potentially have one Stream Aligned Team working on the application and one on the platform.

Let’s say the business and demand for application complexity continues to increase and the application team splits into two teams focussed on different application value streams.  We’re now in a situation where they both most likely re-use the same platform team and we get the benefit of re-using it.  This could of course repeat many times.  The authors recognised the nature of running a platform team (especially in terms of the type of coupling to other teams and effective modes of communication) differed enough to create a new team type and this they called Platform Team.

Platform Teams create and operate something re-usable for meeting one or more application hosting requirements.  These could be running platform applications like Kubernetes, a Content Management System as a service, IaaS, or even teams wrapping third party as-a-service services like a Relational Database Service.  They may also be layering on additional features valuable to the consumers of the platform such as security and compliance scanning.

But there are potential traps with platform teams.  If the platform is too opinionated and worse usage is also mandated within an organisation the impedance mismatch may be more harm than good for consuming applications.  If a platform isn’t observable and applications are deployed into a black box, the application teams will be disconnected from enough detail about how their application is performing in production and disempowered to make a difference to it.  The book goes into a lot of detail about how to avoid creating bad platform teams and strongly makes an argument for keeping platforms as thin as possible (which it calls minimum viable platform).

Personally, I think the dynamic that occurs when consuming a platform from a third party is very powerful for a number of reasons:

  • The platform provider is under commercial pressure to create a platform good enough that consumers pay to use it.
  • The platform provider has to be able to deliver the service at a cost point below the total revenue it’s users will give it (so it will be efficient and chose it’s offered services carefully)

If organisations can at least try to think like that (with or without internal use of “wooden dollars”) I think they will create an effective dynamic.  The book mentions that Don Reinertsen recommends internal pricing as a mechanism for avoiding platform consumers demanding things they don’t need.

Enabling Teams

The next type of team is designed to serve other teams to help them improve.  I think it is great that the book acknowledges and explores these as they are very common and often a good thing.  Enabling teams should contain experienced people excited about finding ways to help other teams improve.  To some extent they can be thought of as technical consultants with a remit for helping drive improvement.

Most important is how well Enabling teams engage with other teams.  There are various traps such a team can fall into and must avoid:

  • Becoming an ivory tower defining processes, policies, perhaps even technical ‘decisions’ and inflicting them upon the teams that they are supposed to be helping.
  • Generally being disruptive by causing things like interruptions, context switching, cognitive load, and communication overhead – especially if this cost outweighs the benefit.

As with all other types, the book provides detailed expected behaviours and success criteria for enabling teams such as understanding the needs of the teams they support and taking an experimental approach to meeting these.  It’s also important that enabling teams bring in a wider perspective perhaps of ideas about technology advances and industry technology and process trends.  Enabling teams may be long lived but their engagement with the teams they are supporting should probably not be permanent.

Complicated-Subsystem Teams

Finally, the book defines a 4th type of team called a complicated-subsystem team.  Essentially these teams own the development and maintenance of a complicated subcomponent that is probably consumed as code or a binary, rather than at runtime as a service over a network.  The concept is that the component requires specialist knowledge to build and change and that can be done most effectively by a dedicated team without the cognitive load required to consume and deploy their component.

Other types of teams: SRE teams

I did feel the book acknowledged some other team types outside of the main 4, for example, SRE teams.

Without getting into too much detail, Site Reliability Engineering (SRE) is an approach to Operations that Google developed and started sharing with the world a couple of years ago.  The thing that is most interesting about it from a team design perspective is that an SRE team is a traditional separate operations team.  In this regard it doesn’t really fit the 4 topologies above.  I think there are a couple of reasons the SRE model is successful:

  1. At least at Google, the default mode of operation is Stream Aligned teams.  An SRE team will only operate an application if it is demonstrably reliable and doesn’t require a lot of manual effort to operate.
  2. It promotes use of some conceptually simple but effective metrics to ensure the application meets the standards for an SRE to operate it.

The book says that if your systems are reliable and operable enough you can use SRE teams.  Doing so will of course reduce some cognitive load on the team by having to pay less attention to the demands of an application in production.  This is actually very interesting because in some ways a Stream Aligned team handing over applications to an SRE team to operate is very similar to a traditional Dev and Ops team.

So, the message in a way becomes if your processes are mature enough and your engineering good enough, traditional ways will suffice, BUT the problem is they probably aren’t and instead you need to use Stream Aligned teams until you get there.  I think this leads to an alternative option for organisations – directly implement the measurement of SRE and focus on quality until you can make your existing separate Dev and Ops perform well enough.   I even suggest this justifies a first class fifth team type.

Other types of teams: Service Experience

The book also acknowledges that many business models include call centre teams and service desk teams who engage with customers directly.  These teams are proposed to be kept separate from the team types above but also to ensure information about the operations of the live system makes it back to the Stream Aligned teams.  It also talks about the concept of grouping and tightly aligning Service Experience Teams with Stream Aligned teams.  The Service Experience Teams are named this to emphasise their customer orientation.

A key success criterion is the relationship to Stream Aligned teams which must be close and ideally 1 to 1.  I think this is an excellent point and that in many organisations it’s as much the 1 to many (overly shared) relationship between teams that should be more closely aligned that cause as many problems as the team boundaries.  For example, if an Application support team is spread across too many applications whilst it may achieve some efficiencies of scale, the overall service delivered to each supported team will be far less effective than if the team was subdivided into smaller more closely aligned teams.

If you spotted other team types in the book that were acknowledged as acceptable – let me know.

The Three Modes of Team Interaction

The book creates 3 interaction modes and recommends which team types should use each one.  By interactions the book means how and when teams communicate and collaborate with each other.  Central to this is the idea that communication is expensive and erodes team structures, so should be used wisely.

The first interaction mode the book defines is called Collaboration.  This is a multi-directional regular and close interaction.  It is fast if not often real time and therefore responsive and agile.  It is the mode than enables teams to work most closely together and is useful when there is a high degree of uncertainty or change and co-evolution is needed.

The cost of this mode is higher communication overhead and increased cognitive load – because teams will need to know more about the other teams.

Stream Aligned Teams will be the most likely to use this, probably with other Stream Aligned teams and especially earlier in the life of a particular product when uncertainty is highest, and boundaries are evolving.

The second mode is called X-as-a-Service.  A team wishing to use this interaction mode needs to consciously design and optimise it so that it serves both them and the other parties as effectively as possible.

Making this mode effective entails abstracting unnecessary detail from other teams and making the interface discoverable, self-documented and possibly an API.

The down side of this mode is that if it isn’t implemented effectively it may reduce the flow of other teams.  It also needs to stay relatively static in order to avoid consuming teams having to constantly relearn how to integrate.

As you’ve probably guessed this is a great model for a Platform Team to adopt – especially if they are highly re-used by many consumers.  It can be especially effective when platforms are well established and do not require rapid change or co-evolution of the API.

The final mode is called Facilitating.   This is the best mode for Enabling teams and describes how enabling teams can be helpful without being over demanding.

All of the above modes are described in much more detail than this and have very actionable advice

Continuous Evolution

The book has a section on static topologies i.e. how teams may interact at a point in time.  However, it stresses the importance of sensing and adapting.  As the maturity of a team changes collaboration modes and even team types may change or subdivide.

The Team Topologies book uses SRE as a good example of teams moving from Stream Aligned, to a function containing reliability specialists (who may or may not call themselves SREs) operating as an enabling team, to an SRE team acting as a separate Ops team and then back.  Obviously, there is a tension with making these changes to keep teams effective with the costs of changing teams and resetting back to Storming (as per Tuckman).

The book even offers advice for helping existing functional or traditional teams transition into the new model for example: infrastructure teams to Platform teams, support teams to Stream Aligned teams.

In the final part of this series, I will share some thoughts about how to use this information.

How my team do root cause analysis

This blog is more or less a copy and paste of a wiki page that my team at work use as part of our Problem Management process.  It is heavily inspired by lots of good writing about blameless postmortems for example from Etsy and the Beyond Blame book.  Hope you find it useful.

RCA Approach

 

This page describes a 7 step approach to performing RCAs.  The process belongs to all of us, so please feel free to update it.

Traditionally RCA stands for Root Cause Analysis.  However, there are two problems with this:

  1. It implies there is one root cause.  In practice it is often a cocktail of contributing causes as well as negative (and sometimes positive) outcomes
  2. The name implies that we are on a hunt for a cause.  We are on a hunt for causes, but only to help us identify preventative actions.  Not just to solve a mystery or worse find an offender to punish.

Therefore RCA is proposed to stand for Recurrence Countermeasure Analysis.

Step 1: Establish “the motive”

Ask the following:

Question: Does anyone think anyone in our team did something deliberately malicious to cause this?  i.e. they consciously carried out actions that they knew would cause this or something of similar negative consequences or they clearly understood the risks but cared so little that they weren’t deterred?

and

Question: Does anyone think anyone outside our team… (as above).

The assumption here is that the answer is “NO” to both questions.  If it is “NO”, we can now proceed with a blameless manner, i.e. never stopping our analysis at a point where a person should (or could) have done something different.

If either answers are “YES”.  This is beyond the scope of this approach.

Step 2: Restate our meaning of “Blameless”

Read aloud the following to everyone participating in the RCA:

“We have established that we don’t blame any individual either internal or external to our organisation for the incident that has triggered this exercise.  Our process has failed us and needs our collective input to improve it.  If at any point during the process anyone starts to doubt this statement or act like they no longer believe it we must return to Step 1.  Everyone is responsible for enforcing this.

What is at stake here is not just getting to the bottom of this incident, it’s getting to the bottom of this incident and every future occurrence of the same incident.  If anyone feels mistreated by this process, by human nature they will take actions in the future to disguise their actions to limit blame and this will damage our ability to continuously improve.”

Step 3: Restate the rules

During this process we will follow these rules:

  1. Facts must not be subjective.  If an assertion of fact cannot be 100% validated we should agree and capture our confidence level (e.g. High, Medium, Low).  We must also capture the actions that we could do to validate it.
  2. If we don’t have enough facts, we will prioritise the facts that we need go away and validate before reconvening to continue.  Before suspending the process, agree a full list of “Things we wish we knew but don’t know”, capture the actions that we could do to validate them and prioritise the discovery.
  3. If anyone feels uncomfortable during the process due to:
    1. Blame
    2. Concerns with the process
    3. Language or tones of voice
    4. Their ability have their voice heard they must raise it immediately.
  4. We are looking for causes only to inform what we can do to prevent re-occurrence, not to apportion blame.

Step 4: Agree a statement to describe the incident that warranted this RCA

Using an open discussion attempt to reach a consensus over a statement that describes the incident that warranted this RCA.  This must identify the thing (or things) that we don’t want to happen again (including all negative side-effects).  Don’t forget the impact on people e.g. having to work late to fix something.  Don’t forget to capture the problem from all perspectives.

Write this down somewhere everyone can see.

Step 5: Mark up the problem statement

Look at the problem statement and identify and underline every aspect of the statement that someone could ask “Why” about.  Try to take an outsider view, even if you know the answer or think something cannot be challenged, it is still in scope for being underlined.

Step 6: Perform the analysis

Document the “Why” question related to each underlined aspect in the problem statement.

For each “Why” question attempt to agree on one direct answer.  If you find you have more than one direct answer, split your “Why” question into enough more specific “Why” questions so that your answers can be correlated directly.

Mark up the answers as you did in Step 5.

Repeat this step until you’ve built up a tree with at least 5 answers per branch and at least 3 branches.  If you can’t find at least 3 branches, you need to ask more fundamental “Why” questions about your problem statement and answers.  If you can’t ask and answer more than 5 “Why”s per branch possibly you are taking too large steps.

Do not stop this process with any branch ending on a statement that could be classified “human error”.  (Refer to what we agreed at step 1).

Do not stop this process at something that could be described as a “third party error”.  Whilst the actions of third parties may not be directly under our control, we have to maintain a sense of accountability for the problem statement where if necessary we should have implemented measures to protect ourselves from the third party.

Step 7: Form Countermeasure Hypothesis

Review the end points of your analysis tree and make hypothesis’ about actions that could be taken to prevent future re-occurrences. Like all good hypothesis’ these should be specific and testable.

Use whatever mechanism you have for capturing and prioritising the proposed work to track the identified actions and get them implemented.  Use your normal approach to stating acceptance criteria and don’t close the actions unless they satisfy the tests that they have been effective.

 

ADOP with Pivotal Cloud Foundry

As I have written here, the DevOps Platform (aka ADOP) is an integration of open source tools that is designed to provide the tooling capability required for Continuous Delivery.

In this blog I will describe integrating ADOP and the Cloud Foundry public PaaS from Pivotal.  Whilst it is of course technically possible to run all of the tools found in ADOP on Cloud Foundry, that wasn’t our intention.  Instead we wanted to combine the Continuous Delivery pipeline capabilities of ADOP with the industrial grade cloud first environments that Cloud Foundry offers.

Many ADOP cartridges for example the Java Petclinic one contain two Continuous Delivery pipelines:

  • The first to build and test the infrastructure code and build the Platform Application
  • The second to build and test the application code and deploy it to an environment built on the Platform Application.

The beauty of using a Public PaaS like Pivotal Cloud Foundry is that your platforms and environments are taken care of leaving you much more time to focus on the application code.  However you do of course still need to create an account and provision your environments.

  1. Register here
  2. Click Pivotal Web Services
  3. Create a free tier account
  4. Create and organisation
  5. Create one or more spaces

With this in place you are ready to:

  1. Spin up and ADOP instance
  2. Store your Cloud Foundry credentials in Jenkins’ Secure Store
  3. Load the Cloud Foundry Cartridge (instructions)
  4. Trigger the Continuous Delivery pipeline.

Having done all of this, the pipeline now does the following:

  1. Builds the code (which happens to be the JPetStore
  2. Runs the Unit Test and performs Static Code Analysis using SonarQube
  3. Deploys the code to an environment also known in Cloud Foundry as a Space
  4. Performs functional testing using Selenium and some security testing using OWASP ZAPP.
  5. Performs some performance testing using Gatling.
  6. Kills the running application in environment and waits to verify that Cloud Foundry automatically restores it.
  7. Deploys the application to a multi node Cloud Foundry environment.
  8. Kills one of the nodes in Cloud Foundry and validates that Cloud Foundry automatically avoids sending traffic to the killed node.

The beauty of ADOP is that all of this great Continuous Delivery automation is fully portable and can be loaded time and time again into any ADOP instance running on any cloud.

There is plenty more we could have done with the cartridge to really put the PaaS through its paces such as generating load and watching auto-scaling in action.  Everything is on Github, so pull requests will be warmly welcomed!  If you’ve tried to follow along but got stuck at all, please comment on this blog.

Join the DevOps Community Today!

As I’ve said in the past, if your organisation does not yet consider itself to be “doing DevOps” you should change that today.

If I was pushed to say the one thing I love most about the DevOps movement, it would be the sense of community and sharing.

I’ve never experienced anything like it previously in our industry.  It seems like everyone involved is united by being passionate about collaborating in as many ways as possible to improve:

  • the world through software
  • the rate at which we can do that
  • the lives of those working our industry.

The barrier to entry to this community is extremely low, for example you can:

You could also consider attending the DevOps Enterprise Summit London (DOES).  It’s the third DOES event and the first ever in Europe and is highly likely to be one of the most important professional development things you do this year.  Organised by Gene Kim (co-author of The Phoenix Project) and IT Revolution, the conference is highly focused on bringing together anyone interested in DevOps and providing them as much support as humanly possible in two days.  This involves presentations from some of the most advanced IT organisations in the world (aka unicorns), as well as many from those in traditional enterprises who may be on a very similar journey to you.   Already confirmed are talks from:

  • Rosalind Radcliffe talking about doing DevOps with Mainframe systems
  • Ron Van Kemenade CIO of ING Bank
  • Jason Cox about doing DevOps transformation at Disney
  • Scott Potter Head of New Engineering at News UK
  • And many more.

My recommendation is to get as many of your organisation along to the event as possible.  They won’t be disappointed.

Early bird tickets are available until 11th May 2016.

(Full disclosure – I’m a volunteer on the DOES London committee.)

London Banner logo_770x330

Apple Music versus Continuous Delivery

Last week my phone received iOS update 8.4.1,  Being into Software Release Management, naturally I read the release note.

On a slight tangent, I’m ashamed to say that when it comes to my phone, I don’t practice Continuous Deployment of app updates – even though I know it’s been possible since iOS 7. Instead, I like to also read those release notes before deploying (it’s not that bad – I don’t raise a Change Request or run a CAB!)

Anyway, the release note explained that the reason for an update was an update to Apple Music (a new music streaming software aimed to challenge Spotify.  At this point my Release Management instincts were offended:

I have to update my whole Operating System in order to update just one application?

And one application I don’t actually use.  (NB I actually don’t use it because I’m tied to Spotify through my phone contact, not because I’ve yet tried and/or rejected Apple Music.)  Even worse, this upgrade caused me downtime on my phone.

So I wonder:

How long Apple will continue to release a monolithic Operating System and Music Application build package?

Or interestingly do they have other reasons for doing this?  (I notice they did bundle security updates.)

Spotify have a standalone application (thanks to the App Store capable of Continuous Deployment of updates – for those so inclined) and they release very regular updates.  I can’t see people being happy with taking iOS Operating System updates as frequently (unless they can start happening without an outage).

In my opinion, whilst this situation continues Spotify have to posses a commercial advantage. Especially with all the great things we see and read about their agile culture.

Proposed Reference Architecture of a Platform Application (PaaA)

In this blog I’m going to propose a design for modelling a Platform Application as a series of generic layers.

I hope this model will be useful for anyone developing and operating a platform, in particular if they share my aspirations to treat the Platform as an Application and to:

Hold your Platform to the same engineering standards as a you would (should?!) your other applications

This is my fourth blog in a series where I’ve been exploring treating our Platforms as an Application (PaaA). My idea is simple, whether you are re-using a third Platform Application (e.g. Cloud Foundry) or rolling your own, you should:

  • Make sure it is reproducible from version control
  • Make sure you test it before releasing changes
  • Make sure you release only known and tested and reproducible versions
  • Industrialise and build a Continuous Delivery pipeline for you platform application
  • Industrialise and build a Continuous Delivery pipeline within your platform.

As I’ve suggested, if we are treating Business Applications as Products, we should also treat our Platform Application as a Product.  With approach in mind, clearly a Product Owner of a Business Application (e.g. a website) is not going be particularly interested in detail about how something like high-availability works.

A Platform Application should abstract the applications using it from many concerns which are important but not interesting to them.  

You could have a Product owner for the whole Platform Application, but that’s a lot to think about so I believe this reference architecture is a useful way to divide and conquer.  To further simply things, I’ve defined this anatomy in layers each of which abstracts next layer from the underlying implementation.

So here is it is:

PaaA_Anatomy

 

Starting from the bottom:

  • Hardware management 
    • Consists of: Hypervisor, Logical storage managers, Software defined network
    • The owner of this layer can makes the call: “I’ll use this hardware”
    • Abstracts you from: the hardware and allows you two work logically with compute, storage and network resources
    • Meaning: you can: within the limits of this layer e.g. physical capacity or performance consider hardware to be fully logical
    • Presents to the next layer: the ability to work with logical infrastructure
  • Basic Infrastructure orchestration 
    • Consists of: Cloud console and API equivalent. See this layer as described in Open Stack here
    • The owner of this layer can make the call: “I will use these APIs to interact with the Hardware Management layer.”
    • Abstracts you from: having to manually track usage levels of compute and storage. Monitor the hardware.
    • Meaning you can: perform operations on compute and storage in bulk using an API
    • Presents to the next layer: a convenient way to programmatically make bulk updates to what logical infrastructure has been provisioned
  • Platform Infrastructure orchestration (auto-scaling, resource usage optimisation)
    • Consists of: effectively a software application built to manage creation of the required infrastructure resources required. Holds the logic required for auto-scaling, auto-recovery and resource usage optimisation
    • The owner of this later can make the call: “I need this many servers of that size, and this storage, and this network”
    • Abstracts you from: manually creating the scaling required infrastructure and from changing this over time in response to demand levels
    • Meaning you can: expect that enough logical infrastructure will always be available for use
    • Presents to the next layer: the required amount of logical infrastructure resources to meet the requirements of the platform
  • Execution architecture 
    • Consists of: operating systems, containers, and middleware e.g. Web Application Server, RDBMS
    • The owner of this later can make the call: “This is how I will provide the runtime dependences that the Business Application needs to operate”
    • Abstracts you from: the software and configuration required your application to run
    • Meaning you can: know you have a resource that could receive release packages of code and run them
    • Presents to the next layer: the ability to create the software resources required to run the Business Applications
  • Logical environment separation
    • Consists of: logically separate and isolated instances of environments that can use to host a whole application by providing the required infrastructure resources and runtime dependencies
    • The owner of this layer can make the call: “This is what an environment consists of in terms of different execution architecture components and this is the required logical infrastructure scale”
    • Abstracts you from: working out what you need to create fully separate environments
    • Meaning you can: create environments
    • Presents to the next layer: logical environments (aka Spaces) where code can be deployed
  • Deployment architecture
    • Consists of: the orchestration and automation tools required release new Business Application releases to the Platform Application
    • The owner of this layer can make the call: “These are the tools I will use to deploy the application and configure it to work in the target logical environment”
    • Abstracts you from: the details about how to promote new versions of your application, static content, database and data
    • Meaning you can: release code to environments
    • Presents to the next layer: a user interface and API for releasing code
  • Security model
    • Consists of: a user directory, an authentication mechanism, an authorisation mechanism
    • The owner of this later can make the call: “These authorised people can do the make the following changes to all layers down to Platform Infrastructure Automation”
    • Abstracts you from: having to implement controls over platform use.
    • Meaning you can: empower the right people and be protected from the wrong people
    • Makes the call: “I want only authenticated and authorised users to be able to use my platform application”

I’d love to hear some feedback on this.  In the meantime, I’m planning to map some of the recent projects I’ve been involved with into this architecture to see how well they fit and what the challenges are..