How Psychologically Safe does your team feel?

As per this article, Google conducted a two-year long study into what makes their best teams great and found psychological safety to be the most important factor.

As per Wikipedia, psychological safety can be defined as:

“feeling able to show and employ one’s self without fear of negative consequences of self-image, status or career”

It certainly seems logical to me that creating a safe working environment where people are free to share their individual opinions and avoid group think, is highly important.

So the key question is how can you foster psychological safety?

Some of the best advice I’ve read was from this this blog by Steven M Smith.  He suggests performing the paper-based voting exercise to measure safety.

Whilst we’ve found this to be good technique, the act of screwing up bits of paper is tedious and hard to do remotely.  Hence we’ve created an online tool:

https://safetychecker.herokuapp.com/

Please give it a go and share your experiences!

Advertisements

Running the DevOps Platform on Google Compute Engine

Sometimes knowing something is possible just isn’t good enough.  So here is how I spun up the DevOps Platform on Google Compute Engine (GCE).

1. I needed a Google Compute Engine account.

2. I enabled the Google Compute APIs for my GCE account

3. I installed the Google Cloud commandline API

4. I opened the Google Cloud SDK Shell link that had appeared in my Windows Start menu and ran:

C:\> gcloud auth login

This popped open a Chrome window and asked me to authenticate against my GCE account.

5. (Having previously installed Docker Toolbox, see here) I opened Git Bash (as an Administrator) and ran this command:

$ docker-machine create --driver google \
                 --google-project <a project in my GCE account> \
                 --google-machine-type n1-standard-2 \
                 markosadop01

You will notice that this is fairly standard.  I picked an n1-standard-2 machine type which is roughly equivalent to what we use for AWS.

6. I waited while a machine was created in Google containing Docker

7. I cloned the ADOP Docker Compose repository from GitHub:

$ git clone https://github.com/Accenture/adop-docker-compose
$ cd adop-docker-compose

8. I ran the normal startup.sh command as follows:

$ git clone https://github.com/Accenture/adop-docker-compose
$ ./startup.sh -m markosadop01 -c NA

And hey presto:

...
SUCCESS, your new ADOP instance is ready!
Run these commands in your shell:
eval "$(docker-machine env $MACHINE_NAME)"
source env.config.sh
Navigate to http://104.197.235.64 in your browser to use your new DevOps Platform!

And just to prove it:

$ whois 104.197.235.64 | grep Org
Registrant Organization: Google Inc.
Admin Organization: Google Inc.
Tech Organization: Google Inc.

9. I had to go to Networks > Firewall rules and added a rule to allow HTTP to my server.

10. I viewed my new ADOP on Google instance in (of course…) Chrome!

Lovely stuff!

Start Infrastructure Coding Today!

* Warning this post contains mildly anti-Windows sentiments *

It has never been easier to get ‘hands-on’ with Infrastructure Coding and Containers (yes including Docker), even if your daily life is spent using a Windows work laptop.  My friend Kumar and I proved this the other Saturday night in just one hour in a bar in Chennai.  Here are the steps we performed on his laptop.  I encourage you to do the same (with an optional side order of Kingfisher Ultra).

 

  1. We installed Docker Toolbox.
    It turns out this is an extremely fruitful first step as it gives you:

    1. Git (and in particular GitBash). This allows you to use the world’s best Software Configuration Management tool Git and welcomes you into the world of being able to use and contribute to Open Source software on Git Hub.  Plus it has the added bonus of turning  your laptop into something which understands good wholesome Linux commands.
    2. Virtual Box. This is a hypervisor that turns your laptop from being one machine running one Operating System (Windoze) into something capable of running multiple virtual machines with almost any Operating System you want (even UniKernels!).  Suddenly you can run (and develop) local copies of servers that from a software perspective match Production.
    3. Docker Machine. This is a command line utility that will create virtual machines for running Docker on.  It can do this either locally on your shiny new Virtual Box instance or remotely in the cloud (even the Azure cloud – Linux machines of course)
    4. Docker command line. This is the main command line utility of Docker.  This will enable you to download and build Docker images, and turn them into running Docker containers.  The beauty of the Docker command line is that you can run it locally (ideally in GitBash) on your local machine and have it control Docker running on a Linux machine.  See diagram below.
    5. Docker Compose. This is a utility that gives you the ability to run and associate multiple Docker containers by reading what is required from a text file.DockerVB
  2. Having completed step 1, we opened up the Docker Quickstart Terminal by clicking the entry that had appeared in the Windows start menu. This runs a shell script via GitBash that performs the following:
    1. Creates a virtual box machine (called ‘default’) and starts it
    2. Installs Docker on the new virtual machine
    3. Leaves you with a GitBash window open that has the necessary environment variables set to instruct point Docker command line utility to point at your new virtual machine.
  3. We wanted to test things out, so we ran:
    $ docker ps –a
    CONTAINER ID  IMAGE   COMMAND   CREATED   STATUS   PORTS  NAMES

     

    This showed us that our Docker command line tool was successfully talking to the Docker daemon (process) running on the ‘default’ virtual machine. And it showed us that no containers were either running or stopped on there.

  4. We wanted to testing things a little further so ran:
    $ docker run hello-world
     
    Hello from Docker.
    
    This message shows that your installation appears to be working correctly.
     
    
    To generate this message, Docker took the following steps:
    
    The Docker client contacted the Docker daemon.
    The Docker daemon pulled the "hello-world" image from the Docker Hub.
    The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
    
    The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.
    
     
    
    To try something more ambitious, you can run an Ubuntu container with:
    
    $ docker run -it ubuntu bash
    
     
    
    Share images, automate workflows, and more with a free Docker Hub account:
    
    https://hub.docker.com
    
     
    
    For more examples and ideas, visit:
    
    https://docs.docker.com/userguide

     

    The output is very self-explanatory.  So I recommend reading it now.

  5. We followed the instructions above to run a container from the Ubuntu image.  This started for us a container running Ubuntu and we ran a command to satisfy ourselves that we were running Ubuntu.  Note one slight modification, we had to prefix the command with ‘winpty’ to work around a tty-related issue in GitBash
    $ winpty docker run -it ubuntu bash
    
    root@2af72758e8a9:/# apt-get -v | head -1
    
    apt 1.0.1ubuntu2 for amd64 compiled on Aug  1 2015 19:20:48
    
    root@2af72758e8a9:/# exit
    
    $ exit

     

  6. We wanted to run something else, so we ran:
    $ docker run -d -P nginx:latest

     

  7. This caused the Docker command line to do more or less what is stated in the previous step with a few exceptions.
    • The –d flag caused the container to run in the background (we didn’t need –it).
    • The –P flag caused docker to expose the ports of Nginx back to our Windows machine.
    • The Image was Nginx rather than Ubuntu.  We didn’t need to specify a command for the container to run after starting (leaving it to run its default command).
  8. We then ran the following to establish how to connect to our Nginx:
    $ docker-machine ip default
    192.168.99.100
    
     $ docker ps
    
    CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                                           NAMES
    
    826827727fbf        nginx:latest        "nginx -g 'daemon off"   14 minutes ago      Up 14 minutes       0.0.0.0:32769->80/tcp, 0.0.0.0:32768->443/tcp   ecstatic_einstein
    
    

     

  9. We opened a proper web brower (Chrome) and navigated to: http://192.168.99.100:32769/ using the information above (your IP address may differ). Pleasingly we were presented with the: ‘Welcome to nginx!’ default page.
  10. We decided to clean up some of what we’re created locally on the virtual machine, so we ran the following to:
    1. Stop the Nginx container
    2. Delete the stopped containers
    3. Demonstrate that we still had the Docker ‘images’ downloaded

 

$ docker kill `docker ps -q`

8d003ca14410
$ docker rm `docker ps -aq`

8d003ca14410

2af72758e8a9

…

$ docker ps -a

CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

$ docker images

REPOSITORY                     TAG                 IMAGE ID            CREATED             VIRTUAL SIZE

nginx                          latest              sha256:99e9a        4 weeks ago         134.5 MB

ubuntu                         latest              sha256:3876b        5 weeks ago         187.9 MB

hello-world                    latest              sha256:690ed        4 months ago        960 B

 

 

  1. We went back to Chrome and hit refresh. As expected Nginx was gone.
  2. We opened Oracle VM Virtual box from the Windows start machine so that we could observe our ‘default’ machine listed as running.
  3. We ran the following to stop our ‘default’ machine and also observed it then stopping Virtual Box:
    $ docker-machine stop default

     

  4. Finally we installed Vagrant. This is essentially a much more generic version of Docker-Machine that is capable of creating not just virtual machines in Virtual Box for Docker, but for many other purposes.  For example from an Infrastructure Coding perspective, you might run a virtual machine for developing Chef code.

 

Not bad for one hour on hotel wifi!

Kumar keenly agreed he would complete the following next steps.  I hope you’ll join him on the journey and Start Infrastructure Coding Today!

  1. Learn Git. It really only takes 10 minutes with this tutorial LINK to learn the basics.
  2. Docker – continue the journey here
  3. Vagrant
  4. Chef
  5. Ansible

 

Please share any issues following this and I’ll improve the instructions.  Please share  any other useful tutorials and I will add those also.

At what point will the human race depend on DevOps?

Here are some highlights of Devops Days London November 2013 – Day 1 that I wanted to share.

Mark Burgess gave an interesting talk based on his new book In Search Of Certainty.  To those that don’t already know Mark, he was the creator of CFEngine which is father of Puppet and the grandfather of Chef.  CFEngine began 20 years ago, so it is very interesting for those of us who have only a couple of years working with active Configuration Management tools to learn intricacies from someone who has been evolving them and observing them for 20 years.

Mark started off challenging the notion that “consistency equals quality” recognizing the almost organic nature of highly complex systems.  All in all, the talk was fairly challenging stuff to start the day with…

Mark ended his talk with a sobering thought that the human race is becoming so dependent on complex IT systems that at some point our whole prospects of survival will depend on them.  This left me thinking, it’s hard enough using DevOps techniques to delivery more continuously, did he have to raise the stakes so much higher!?

Doug Barth from Pager Duty did a great talk called Failure Fridays, which also could be called Chaos Monkey for Mortals (i.e. artificially creating failure for testing when you are not NetFlix).  It was a great session because he really explained the in’s and out’s of what they do, which is basically spending one hour per week (in total) systematically breaking, observing and fixing their system (only creating failures at a scale that they expect to be able to tolerate without impacting their customers).

Where I work, system delivery projects include formally testing all of these things up front and we call it Operator Acceptance Testing.  However, I definitely like the idea of it being an on-going activity (certainly resonates with the “if it’s difficult, do it often mantra”).

Practical considerations of doing it weekly with the live system included:

  • constrain to one hour (end to end)
  • involve of all impacted parties all in one room (chat/video conference room if necessary)
  • lots of planing and careful assessment of risk before hand
  • careful what you disable:
    • disabling auto-healing mechanisms = probably good
    • disabling monitoring and alerts (things you want to test) = bad
  • effective communications before during and afterwards (this is not like a surprise fire drill)
  • starting small e.g. one server and scaling up to a bank

Doug also described other virtues of this activity such as it creating a great opportunity to train new team members and the effect that it had on increasing focus on failure with developers.  I’d bet that at least half the people in the audience went away wanting to implement Failure Fridays and many of them will actually do that.  Just not on a Friday though – why jeopardize Saturdays?

Ben Hughes from Etsy did the next talk about Security and it was a very lively, entertaining and informative session.  Rather than try to do it justice, instead I just highly recommend watching the video of his 5 minute ignite from DevOpsDays Portland.  However just one quote I can’t resist sharing here (credited to Ben’s boss apparently)

“If you go down from 35% on fire to 24% on fire, you are still on fire”

– referring to issues that you have to accept exist and manage rather than trying to avoid / deny the possibility of happening.

After my ignite where I stood up for DevOps teams, I’ll admin, my head was a bit of a blur and I didn’t take any notes.  However, a special highlight to Daniel Breston for teaching us about oobeya.

The afternoon Open Sessions of course delivered some lively debate.  It was particularly interesting to see some early examples of discussions about Docker versus Configuration Management.  A debate that is likely to rage!

All in all, an excellent first day (even without needing to mention the free bar in the evening)!  Thanks DevOpsDays!

Calling DevOps teams an antipattern is an antipattern

Calling DevOps teams an anti-pattern IS an anti-pattern because needing no DevOps team is the destination but not necessarily the journey.

In Star Wars terms, achieving the Continuous Delivery Capability without a supporting DevOps team is Jedi. Making tangible improvements with the support of a dedicated team could be the rebel alliance, but is definitely not the dark side!

Enterprises should feel an urgency to try the Continuous Delivery practice and must feel empowered to experiment on how best to adopt it in their own organisation.  All organisations whilst united by the need to adopt the Continuous Delivery practice, have a variety of different cultures, skills, history, priorities, accountability, politics etc.  Therefore it is logical to expect that a variety of approaches will be effective towards achieving Continuous Delivery.

I don’t disagree with some of the arguments against teams, but I just feel that DevOps teams should be understood as having traps to avoid, rather than being a trap to avoid.  In the wrong hands anything powerful usually has the ability to do at least as much bad as good.  Heed the cautions and make sure yours aren’t the wrong hands!

What DevOps strives to achieve

To summarise, DevOps aspires to achieve highly responsive and reliable IT functions that are globally optimised to support and deliver the Business objectives.  In other words: “Deliver what IT is needed, when it’s needed and consistently until the need disappears or changes.”  In more other words: “Restore faith in IT by going the difficult last mile beyond agile development to achieving agile releasing of software.”

What DevOps typically entails in practice

The prominent pattern for doing DevOps is to adopt the Continuous Delivery practice.  DevOps and Continuous Delivery are heavily underpinned by effective solutions for the following core processes:

  • Version Control
  • Configuration and Release Management
  • Environment Management (development, test, and production)
  • Automation and orchestration throughout the software delivery life cycle of at least:
    • code release
    • quality assurance
    • infrastructure.

Take all of the cross-team collaboration / peace and love you like, if the above let you down, you’ll not achieve continuous delivery and worse you’ll have problems.

What traditional Development teams typically achieve

We’ve all heard the statement: “Developers want Change and operators want Stability.”  Surely based on this, at least the developers are ready to embrace the changes required to implement Continuous Delivery?

Unfortunately it’s usually not this type of change!  It is very common for developers to feel pressured to focus almost exclusively on delivering functionality, even at the expense of neglecting all non-functional “ilities” .  I’ve seen agile actually exacerbate this. When Product Owners (usually aka functionality champions) are permanently based in the development team, the focus on working features, features, features can go too far.

Unfortunately regular demos, valuable as they are, can sometimes also negatively reinforce this functionality addiction.  Watching Chef re-build an environment from version control appeals to a fairly select group of people (of which I am included).  But we are significantly outnumbered by people who prefer to watch the shadows move on fancy new sliders in a UI (or other functional things -not my bag).  The typical demo audience can put development teams off tasks which don’t demo well, for example automating server creation.

I’m not criticising agile or arguing that the above problems aren’t solvable.  These are just example causes towards functionality changes dominating at the expense of developing the requisites for Continuous Delivery.  In fact tasks related to Continuous Delivery are often left on the backlog directly below a block of non-functional requirements (if anyone even took the time to determine the relative priorities that far down the list).

What traditional Operations teams typically achieve

A common occurrence is that Operations teams dislike change.  It is absolutely standard for Operations to feel accountable for maximising two things:

  • Stability
  • Efficiency

Stability is threatened by all change.  Not just changes to the code and configuration of supported systems (as requested by Development), changes to people, processes and tools.

One compounding unpleasant phenomena I’ve seen once or twice is a tendency to scape-goat old bad processes that were invented by much blamed predecessors in favour of taking risks on improving, thus avoiding ownership and accountability.  This may lead to prevailing problems like heavily manual deployment processes or worse processes being badged unfit-for-purpose and completely ignored (as we saw with the Change Process in the Phoenix project novel!).

Efficiency is often achieved by cutting back on anything outside core areas of accountability. Continuous Delivery is becoming easier to justify, but making the strategic benefits of automated deployments sound more appealing than the tactical benefits of saving money by hiring fewer people will never get easy.

Efficiency-over-emphasis for Operations parallels the functionality-over-emphasis for Development and the result is to grease the tarmac on which Continuous Delivery needs traction.

Why would you want a DevOps team?

The examples above highlight why both Development and Operations are never going to have an easy ride implementing Continuous Delivery.  We can also understand from these why both Development and Operations teams so often get to consider themselves locally successful without feeling the need to address the end-to-end cross-team effectiveness and the promise of DevOps.

A major contributing factor for a general lack of progress towards Continuous Delivery is the lack of clarity over ownership.  The core concerns listed above don’t usually fit naturally into either team and as a result get left festering (and failing) in the gap between.  I’ve read enough about DevOps to know that to be taken seriously, you generally need a Deming quote.  I found this appropriate:

“Hold everybody accountable? Ridiculous!”, W. Edwards Deming 

Finding a person accountable for each of the awkward core processes is necessary.  Assuming that this person needs dedicated (or shared) resources to succeed, you may have just got yourself a team which could be called DevOps.

What would a DevOps team do?

If we’re ready to make someone accountable for addressing the problems, the two main things that their team could do are:

  1. Provide an appropriate set of services to Development and Operations (on-going).  See here for a full discussion.
  2. Run a transformation programme to expedite adoption of Continuous Delivery (temporary).  See here for a full discussion.

A quick critique of “how to do DevOps without a team” advice that I’ve read

Give developers more visibility of production.
Great idea, but this isn’t going to build an automated tool chain.

Encourage developers and operations staff to have lunch together 
Not bad, will probably locally improve the productivity of the developers who now know buddies in Operations ready to cut them special treatment.  This type of thing is going to take a long time to make a difference in a big enterprise.

Standardise the metrics that influence performance based-pay across development and Operations. 
Like this one. But extremely difficult in a major organisation and certainly not without adverse risks of unexpected side-affects.

More than happy to discuss all other ideas anyone cares to suggest!

Conclusion

If you accept that the above activities are not an anti-pattern, perhaps the only remaining argument is what to call the responsible team.  If you don’t accept them, as one last go, please drill down to my detailed discussions on service and transformation teams.

Personally I think that so long as the people involved understand what DevOps actually is, the name DevOps gives them a daily reminder not to lose sight of their real objective.  Being in a release management team that slows down delivery may carry no stigma.  There is nothing more embarrassing than being in a DevOps team gets associated with impeding DevOps!

Having a DevOps Change Programme Team is a Pattern

As I described here, calling DevOps teams an anti-pattern IS an anti-pattern because needing no DevOps a team is the destination but not necessarily the journey.

The problem was I got very over-enthusiastic and the blog became very long.  Hence I’ve broken out describing why a DevOps Service team is a pattern here.

I described two valid types of team

  1. One that provides an appropriate set of services to Development and Operations (on-going).  Described here.
  2. One that runs a change programme to expedite adoption of Continuous Delivery (temporary).  Described right here in this blog!

DevOps Change Programme Team

For some organisations, depending on how far they have to go, some steps towards Continuous Delivery will just be too hard to deliver from one of the core teams or a service.

Creating a DevOps programme/project team may not be for everyone, but is a valid endeavour.  Such a programme would take end-to-end ownership of increasing Continuous Delivery maturity and achieving a globally optimised IT function.  Programmes may bring a number of advantages.  By requiring investment they automatically incur a level of expectations for senior folk which can help establish credibility i.e. that this is really happening.

The success of the programme should be measured by three outcomes:

  1. the programme-assessed improvements against the original baseline,
  2. the recognition of those improvements by Development and Operations, and
  3. the general view of the outcome from Development and Operations.

Projects making up the programmes could include analysis of current processes, taking a baseline of current performance and performing R&D on tools and processes.

R&D is interesting because it can achieve a lot without even looking outside an organisation.  Most organisations will have systems and processes of varying maturity which provide ready-made case studies of what works and what does not.  They will also have people from a variety of backgrounds and experiences combined with context specific insight into how and what to improve.

Projects can be used to set up the acceptable services listed above.  I’d even go a little further to say my test 1 about not abstracting people from their core responsibilities can be temporarily relaxed (pending later correction).  For example if introducing a major new tool like Puppet or Chef the level of effort may warrant temporary extra assistance from a dedicated experienced team to write the first cut of configuration files.  Obviously this must be done with caution and mindfulness of the need to transition back responsibilities to the rightful owners.

Programmes also have a natural advantage of being painfully aware that they have to convince both Development and Operations that have their best interests at heart.  Staying on the subject of introducing automated Configuration Management tools, when these are driven from Development, they face an almighty challenge getting taken seriously and anywhere near the production environment.  When they are driven from Operations, the sell to developers that they now need to taken previously Operations concerns is not an easy sell.  When they are driven by an independent team, these concerns need to be addressed openly up front.

Conclusion

A well executed DevOps Change Programme can be very effective and having a DevOps Change Programme Team is a Pattern!

Having a DevOps Service Team is a Pattern

As I described here, calling DevOps teams an anti-pattern IS an anti-pattern because needing no DevOps a team is the destination but not necessarily the journey.

The problem was I got very over-enthusiastic and the blog became very long.  Hence I’ve broken out describing why a DevOps Service team is a pattern here.

I described two valid types of team

  1. One that provides an appropriate set of services to Development and Operations (on-going).  Described right here in this blog!
  2. One that runs a change programme to expedite adoption of Continuous Delivery (temporary).  Described here.

DevOps Service Team

Determining what services are appropriate at the heart of the controversy around DevOps teams.  Another way of putting it is “when is an acceptable service not also an evil functional silo?”  For me, any service that a DevOps team might provide must pass the following tests:

  1. Test: Does it abstract Development and/or Operations from the consequences of their actions and their core responsibilities?
  2. Test: Does it complicate communications and the interface between Development and Operations?

If a DevOps Service team run the Continuous Integration / Continuous Delivery orchestration tooling.  This does not move the developers away from writing or testing code and it doesn’t move the operators away from their core concerns, so passes test 1.  These tools are also great for improving communication because they make mundane but critical information such as ‘what code has been deployed where’ transparent, thus freeing up communication channels for more complex matters, thus passing test 2. CloudBees have demonstrated people want this by offering Jenkins as a SaaS.  In the organisation that I work, we also provide a similar (but more extensive) service.

What about hosting Version Control tools?  If your organisation has problems with version control, it will not fail either of my above tests by making the tools the problem of someone from a DevOps team.  GitHub make the success of this model pretty obvious and may also be an appropriate solution for you.  However, alternatively (for various reasons e.g. possibly cheaper in the long-term, consistent with company Infosec requirements etc) an internal DevOps team could be your solution.

What about writing Automated Tests?  Ok, ok, just checking you are concentrating.  This immediately fails my first test.  If Development doesn’t write the tests, they are abstracted from whether the code they are writing even works!  DevOps teams shouldn’t write automated tests!

What about setting up Automated Build Scripts?  This fails my first test.  If a developer doesn’t even know how their code is going to be built for release outside their local workstation, they have been abstracted too far away from writing working code.  However, we live in an imperfect world and there are times when in reality you have to balance test 1 and 2 against test 3:

  1. Test: Do the Development or Operations teams have the capacity and skills to deliver these?

Thankfully generally developers do understand how to create build scripts.  But there are certainly cases, for example with many packaged products, that the developer process for building and deploying code is not easy to script are decouple from being a manual trigger inside the IDE.  Almost invariably this leads to the unceremonious end to a Continuous Delivery journey before even achieving Continuous Integration.  Use of a DevOps service with specialist skills could be appropriate for owning and solving this problem of extracting the requisite commadline build process.   I also want to draw a distinction between setting up scripts and maintaining them.  Once created, Developers need to own build scripts and be empowered to own them without having an inefficient and/or asynchronous external dependency.

As a side point specialist skills, these are much can be much easier to develop-within or attract-to a dedicated DevOps service team.  Trying to persuade some developers that they are interested in scripts or even any code that won’t feature prominently in production isn’t easy.   Trying to persuade operators that they need to write more scripts is usually an easier sell, but giving them a dedicated focus can help a lot.  If you do hire specialists, you also need to be able to protect them.  I’ve seen a number of examples of specialists hired to work on DevOps processes getting quickly irreversibly dragged into normal Development or Operations work (i.e. thus helping with what those teams typically achieve – see sections above!).

DevOps Service Team: Development and Test environments

What about creating and supporting Development and Test environments?  By this I mean ownership of non-production environments where code is almost continually refreshed and tested and may be under constant use by 10’s if not *gulp* 100’s of people.  If environments are experiencing unplanned outages or even just experiencing slow code deployments, this costs the organisation a lot of wasted productivity (not to mention morale).  Since this type of activity is usually on the critical path for releasing software, lost non-production environment time will usually also mean delayed releases.

To an extent creating and supporting non-production environments with a DevOps team fails test 1.  The further an Operations team is from the environments, the later they will find out about new changes that will impact production.  It also fails test 2 as the classic “it works on my development machine” response to things not working in production now has an evil need breed of mutant cousins “it works in the System Test environment”, “it works in the Integration Test environment” etc.

But despite this apparent “new” problem created above, in my experience it usually already exists because I rarely see Operations successfully owning all of the lower environments.  Instead they will be owned by Development (or specifically “Testing”).  This not only distances Operations from the environments but usually breeds frustration within Development. Wherever they have “control”, things appear to “work” (or at least get fixed according to their priorities), but wherever in the life cycle Operations take over is carnage. The later it is, the worse the effect. In fact, you get a problem with a lot of parallels to Martin Fowler’s infamous software integration phase.

Even if creating and supporting Development and Test environments is owned by Operations often this doesn’t always go well because test 3 fails.  Operations then reduce the burden of supporting non-production environments (aka “optimising” efficiency), by handing over full access to the environments to Development.  This quickly breeds snowflakes and the Operations are once again abstracted from the details of non-production environments and trouble is stored up until the first environment that they keep locked down.

Another challenge with Operations managing non-Production environments is that they have achieve the perception of overall service “stability” by playing a numbers game i.e. prioritising based effort based on perceived level of impact.  This equation rarely favours anything non-production and Operations teams always have a solid gold excuse for neglecting development and test environments and tools: “the Prod Card”.  Playing the Prod Card grants impunity from blame despite almost any amount of to non-Production.  The only way to protect non-Production is by identifying people who are dedicated and this could be a shared DevOps service.

Yet another challenge with Operations managing non-Production environments could be that they don’t yet exist.  It’s not unusual on projects that the operator has either not yet been appointed or if internal not yet been commissioned to work on a project.  I’m not arguing that this is desirable, but in those cases having a separate service to manage things can be more effective than having nothing at all.

All in all, non-production environment management is often a big problem which can be solved by a dedicated team provided they acknowledge their existence in relation to the 3 tests and manage their shortcomings accordingly. Also, providing they are aware of the impending ‘integration phase’ when transitioning to Operations.

Conclusion

A well designed DevOps Service Team is a Pattern!