Minimum Viable Operations

Several years ago I started using the name Minimum Viable Operations to describe a concept I’d been tinkering with in relation to: Agile, DevOps, flow, service management and operational risk.  I define it as:

The minimum amount of processes, controls and security required to expose a new system or service to a target consumer whilst achieving the desired level of availability and risk.

These days I see Site Reliability Engineering (SRE) rapidly emerging as a set of processes that help achieve this.  Whilst chatting with a friend about it earlier today I realised I’d never published my blog on this, so here it is.

When you need Minimum Viable Operations

These days lots of organisations have successfully sped up the processes of rapidly innovating and developing new applications using modern lightweight architectures and cloud.  They are probably using an agile development methodology and hopefully some good Continuous Delivery perhaps even using something like the Accenture DevOps Platform (ADOP).  They may have achieved this by innovating on the edge i.e. creating an autonomous sub-organisation free from the trappings and the controls of the mainstream IT department.  I’ve worked with quite a few organisations who have done this in separate possibly more “trendy” offices known as Digital Hubs, Innovation Labs, Studios, etc.

But when the sprints are over, the show and tell is complete and the post-it notes start to drop off the wall… was it all worth it?

If hundreds or maybe even thousands of person hours rich with design thinking, new technologies, and automation culminate in an internal demo that doesn’t reach real end users, you haven’t become agile.  If you’ve accidentally diluted the term Minimum Viable Product (MVP) to mean Release 1 and that took 6 months to be completed, you haven’t become agile. 

Whilst prototypes do serve a purpose,

the highest value to an entrepreneurial organisation comes from proving or disproving whether target users are willing to use and keep using a product or service, and whether there are enough users to make it commercially viable.

This is what Eric Ries calls in the Lean Startup book the: Value Hypothesis and the Growth Hypothesis.  This requires putting something “live” in front of real users ASAP and to do that successfully I think you need Minimum Viable Operations (oh why not, let’s give it a TLA: MVO).

Alternative Options to MVO

Modern standalone agile development organisations have four options:

  1. Don’t bother with any operations and just keep building prototypes and dazzling demos.  The downside is that at some point someone may realise you don’t make money and shut you down.
  2. Don’t bother with MVO but try to get your prototype turned into live systems via a project in the traditional IT department.  The downside is that time to market will be huge and the rate of change of your idea will hit a wall.  CR anyone?
  3. Don’t bother with MVO and just put prototypes live.  The downside here is potentially unbounded risk of reputation and damages through outages and security disasters.
  4. Implement MVO, get things in front of customers, get into a nice tight lean OODA loop PDCA cycle perhaps even using the Cynefin framework, and overall create a learning environment capable of exponential growth.  The downside here is that it might feel a lot less safe than prototyping and you will have to learn to celebrate failure.

Obviously option 4 is my favourite.

Implementing MVO

Aside: for simplicity’s sake, I’m conveniently ignoring the vital topic of organisational governance and politics.  Clearly navigating this is of vital importance, but beyond what I want to get into here.

Implementing Minimum Viable Operations is essentially a risk management exercise with a lot in common with standard Operations Architecture (and nowadays SRE).  Everything just needs to be done much, much faster.  This means thinking very hard and upfront about the ilities such as reliability, availability, scaleability and security etc..  But everything needs to be done much faster and also firstly it requires you to be much more realistic and think harder:

  • Reliability – is 3 9’s really needed in order to get your Minimum Viable Product in front of customers and measure their reactions?  Possibly you don’t actually need to create a perfect user experience for everyone all the time and can evaluate your hypothesis’ based on the usage that you are able to achieve.  Obviously this presents something of a reputation-al risk putting out products of a level of service quality unbefitting of your brand.  But do you actually need to apply your brand to the product?  Or perhaps you can incentivise users to put up with unreliability via other rewards for example don’t charge.
  • Availability – we live in a 24 by 7 always on world, but at the start, do we need our MVP to be always online?  Perhaps if we strictly limit use by geography it becomes painless to have downtime windows.  Perhaps as an extension to the reliability argument above, taking the system down for maintenance, re-design, or even to discontinue it, is acceptable.
  • Scaleability – perhaps if our service gets so popular that we cannot keep it live that isn’t such a bad thing, in fact it could be exactly the problem that we want to have.  It’s like the old premature optimisation is the root of all evil arguement.  There are lots of ways to scale things when needed in the cloud.  Perhaps perfecting this isn’t a pre-requisite to go live.
  • Security – obviously this is a vital consideration.  But again this must be strictly “right sized”.  If a system doesn’t contain personal data or take payment information, you aren’t storing data and you aren’t overly concerned about availability, perhaps this doesn’t have to be quite the barrier to go live that you thought it might be.  Plus if you are practicing good DevSecOps in your development process, and using Nexus Lifecycle, you are off to a good start.

Secondly unlike a traditional Operations Architecture scenario where SLAsRPOs, etc. may have been rather arbitrarily defined (perhaps copied and pasted into a contract), when creating MVO for an MVP, you may actually have the opportunity to change the MVP design to make it easier to put live.  Does the MVP actually need to: be branded with the company name, take payment, require registration, etc.?  If none of that impacts your hypothesis’, consider dropping them and going live sooner.

Finally as a long time fan of PaaS, I also have mention that organisations can also make releasing of MVPs much easier if they have invested in a re-usable hosting platform (which of course may benefit from MVO).

What can someone do with this

I’ve seen value in organisations creating a relatively light methodology for doing something similar to MVO.  Nowadays they also might (perfectly reasonably) also call it implementing SRE.  It could include:

  1. Guidelines for making MVPs smaller and easier to put live and operate.
  2. Risk assessment tools to help right-size Operations.
  3. Risk profiles for different types of application.
  4. Peer review checklists to validate someone else’s minimum viable operations architecture.
  5. Reusable platform (PaaS) services with opinions that natively provide certain safety as well as freedom.
  6. Re-usable security services.
  7. Re-usable monitoring solutions.
  8. Training around embracing learning form failure and creating good customer experiments.
  9. A general increase in focus on and awareness of how quickly we are actually putting things live for clients.
  10. When and how to transition from MVO to a full blown SRE controlled service.

Final Comment

DevOps has alas in many cases become conflated with implementing Continuous Delivery into testing environments.  Some times the last mile into Production is still too long.  Call it MVO (probably don’t!), call it SRE (why not?), still call it DevOps, Agile, or Lean (why not?)… I recommending giving some thought to MVO.