How Psychologically Safe does your team feel?

As per this article, Google conducted a two-year long study into what makes their best teams great and found psychological safety to be the most important factor.

As per Wikipedia, psychological safety can be defined as:

“feeling able to show and employ one’s self without fear of negative consequences of self-image, status or career”

It certainly seems logical to me that creating a safe working environment where people are free to share their individual opinions and avoid group think, is highly important.

So the key question is how can you foster psychological safety?

Some of the best advice I’ve read was from this this blog by Steven M Smith.  He suggests performing the paper-based voting exercise to measure safety.

Whilst we’ve found this to be good technique, the act of screwing up bits of paper is tedious and hard to do remotely.  Hence we’ve created an online tool:

https://safetychecker.herokuapp.com/

Please give it a go and share your experiences!

How my team do root cause analysis

This blog is more or less a copy and paste of a wiki page that my team at work use as part of our Problem Management process.  It is heavily inspired by lots of good writing about blameless postmortems for example from Etsy and the Beyond Blame book.  Hope you find it useful.

RCA Approach

 

This page describes a 7 step approach to performing RCAs.  The process belongs to all of us, so please feel free to update it.

Traditionally RCA stands for Root Cause Analysis.  However, there are two problems with this:

  1. It implies there is one root cause.  In practice it is often a cocktail of contributing causes as well as negative (and sometimes positive) outcomes
  2. The name implies that we are on a hunt for a cause.  We are on a hunt for causes, but only to help us identify preventative actions.  Not just to solve a mystery or worse find an offender to punish.

Therefore RCA is proposed to stand for Recurrence Countermeasure Analysis.

Step 1: Establish “the motive”

Ask the following:

Question: Does anyone think anyone in our team did something deliberately malicious to cause this?  i.e. they consciously carried out actions that they knew would cause this or something of similar negative consequences or they clearly understood the risks but cared so little that they weren’t deterred?

and

Question: Does anyone think anyone outside our team… (as above).

The assumption here is that the answer is “NO” to both questions.  If it is “NO”, we can now proceed with a blameless manner, i.e. never stopping our analysis at a point where a person should (or could) have done something different.

If either answers are “YES”.  This is beyond the scope of this approach.

Step 2: Restate our meaning of “Blameless”

Read aloud the following to everyone participating in the RCA:

“We have established that we don’t blame any individual either internal or external to our organisation for the incident that has triggered this exercise.  Our process has failed us and needs our collective input to improve it.  If at any point during the process anyone starts to doubt this statement or act like they no longer believe it we must return to Step 1.  Everyone is responsible for enforcing this.

What is at stake here is not just getting to the bottom of this incident, it’s getting to the bottom of this incident and every future occurrence of the same incident.  If anyone feels mistreated by this process, by human nature they will take actions in the future to disguise their actions to limit blame and this will damage our ability to continuously improve.”

Step 3: Restate the rules

During this process we will follow these rules:

  1. Facts must not be subjective.  If an assertion of fact cannot be 100% validated we should agree and capture our confidence level (e.g. High, Medium, Low).  We must also capture the actions that we could do to validate it.
  2. If we don’t have enough facts, we will prioritise the facts that we need go away and validate before reconvening to continue.  Before suspending the process, agree a full list of “Things we wish we knew but don’t know”, capture the actions that we could do to validate them and prioritise the discovery.
  3. If anyone feels uncomfortable during the process due to:
    1. Blame
    2. Concerns with the process
    3. Language or tones of voice
    4. Their ability have their voice heard they must raise it immediately.
  4. We are looking for causes only to inform what we can do to prevent re-occurrence, not to apportion blame.

Step 4: Agree a statement to describe the incident that warranted this RCA

Using an open discussion attempt to reach a consensus over a statement that describes the incident that warranted this RCA.  This must identify the thing (or things) that we don’t want to happen again (including all negative side-effects).  Don’t forget the impact on people e.g. having to work late to fix something.  Don’t forget to capture the problem from all perspectives.

Write this down somewhere everyone can see.

Step 5: Mark up the problem statement

Look at the problem statement and identify and underline every aspect of the statement that someone could ask “Why” about.  Try to take an outsider view, even if you know the answer or think something cannot be challenged, it is still in scope for being underlined.

Step 6: Perform the analysis

Document the “Why” question related to each underlined aspect in the problem statement.

For each “Why” question attempt to agree on one direct answer.  If you find you have more than one direct answer, split your “Why” question into enough more specific “Why” questions so that your answers can be correlated directly.

Mark up the answers as you did in Step 5.

Repeat this step until you’ve built up a tree with at least 5 answers per branch and at least 3 branches.  If you can’t find at least 3 branches, you need to ask more fundamental “Why” questions about your problem statement and answers.  If you can’t ask and answer more than 5 “Why”s per branch possibly you are taking too large steps.

Do not stop this process with any branch ending on a statement that could be classified “human error”.  (Refer to what we agreed at step 1).

Do not stop this process at something that could be described as a “third party error”.  Whilst the actions of third parties may not be directly under our control, we have to maintain a sense of accountability for the problem statement where if necessary we should have implemented measures to protect ourselves from the third party.

Step 7: Form Countermeasure Hypothesis

Review the end points of your analysis tree and make hypothesis’ about actions that could be taken to prevent future re-occurrences. Like all good hypothesis’ these should be specific and testable.

Use whatever mechanism you have for capturing and prioritising the proposed work to track the identified actions and get them implemented.  Use your normal approach to stating acceptance criteria and don’t close the actions unless they satisfy the tests that they have been effective.

 

Improving Lives – The MQ Mental Health Charity

MQ are a startup charity from the UK with the ambitious mission to transform mental health.

With one in four of us experiencing mental health conditions and two thirds of people with a mental illness condition currently receiving no treatment, this cause is clearly both relevant and urgent.  If the potential for dramatically improving lives isn’t important enough, the economic benefits also stack up with mental ill health being the number one cause of work sickness absence.  The estimated annual economic and social costs in the UK alone is £105 billion (billion with a b)!

Despite the its importance, improving mental health receives less than 6% of the total UK health research budget.

A charity is clearly needed to change this and MQ have picked up the challenge.  One easy way to think about them is as the cancer research of mental health.  However the comparison has a major flaw and that is funding of their domain.  For every £1 spent by the UK Government on mental health research, the public give just £0.003.  The equivalent number for cancer is £2.75.

Very little of our time as a species has prepared us for the connected online digital world and the associated pressures we now face.

Research suggests 75% of mental health problems first occur in under 18s and that 3 children in every classroom will be living with a mental illness.

With most children today heavily using technology there is a temptation to be pessimistic about the future. However:

there are extraordinary opportunities for technologies such as telemetry (IoT), predictive analytics, and big data sources now freely available like social media to dramatically advance this important field and hence improve lives.

So how can we help?

  1. In early 2017 MQ are launching their first major public campaign. Let’s be on the lookout for it and do our best to amplify MQ on social media.
  2. As part of the campaign they are looking for people to pledge their support to transforming mental health with their “We swear to tackle mental illness” You can do this now, get your best selfie and get on over to the site.
  3. Volunteer your time and especially any charity expertise (they are a startup).
  4. Raise money any way you like and donate it here.
  5. Spread the word!

 

(All stats in this blog taken from MQ materials.)

Using ADOP and Docker to Learn Ansible

As I have written here, the DevOps Platform (aka ADOP) is an integration of open source tools that is designed to provide the tooling capability required for Continuous Delivery.  Through the concept of cartridges (plugins) ADOP also makes it very easy to re-use automation.

In this blog I will describe an ADOP Cartridge that I created as an easy way to experiment with Ansible.  Of course there are many other ways of experimenting with Ansible such as using Vagrant.  I chose to create an ADOP cartridge because ADOP is so easy to provision and predictable.  If you have an ADOP instance running you will be able to experience Ansible doing various interesting things in under 15 minutes.

To try this for yourself:

  1. Spin up and ADOP instance
  2. Load the Ansible 101 Cartridge (instructions)
  3. Run the jobs one-by-one and in each case read the console output.
  4. Re-run the jobs with different input parameters.

To anyone only loosely familiar with ADOP, Docker and Ansible, I recognise that this blog could be hard to follow so here is a quick diagram of what is going on.

docker-ansible

The Jenkins Jobs in the Cartridge

The jobs do the following things:

As the name suggests, this job just demonstrates how to install Ansible on Centos.  It installs Ansible in a Docker container in order to keep things simple and easy to clean up.  Having build a Docker image with Ansible installed, it tests the image just by running inside the container.

$ ansible --version

2_Run_Example_Adhoc_Commands

This job is a lot more interesting than the previous.  As the name suggests, the job is designed to run some adhoc Ansible commands (which is one of the first things you’ll do when learning Ansible).

Since the purpose of Ansible is infrastructure automation we first need to set up and environment to run commands against.  My idea was to set up an environment of Docker containers pretending to be servers.  In real life I don’t think we would ever want Ansible configuring running Docker containers (we normally want Docker containers to be immutable and certainly don’t want them to have ssh access enabled).  However I felt it a quick way to get started and create something repeatable and disposable.

The environment created resembles the diagram above.  As you can see we create two Docker containers (acting as servers) calling themselves web-node and one calling it’s self db-node.  The images already contain a public key (the same one vagrant uses actually) so that they can be ssh’d to (once again not good practice with Docker containers, but needed so that we can treat them like servers and use Ansible).  We then use an image which we refer to as the Ansible Control Container.  We create this image by installing Ansible installation and adding a Ansible hosts file that tells Ansible how to connect to the db and web “nodes” using the same key mentioned above.

With the environment in place the job runs the following ad hoc Ansible commands:

  1. ping all web nodes using the Ansible ping module: ansible web -m ping
  2. gather facts about the db node using the Ansible setup module: ansible db -m setup
  3. add a user to all web servers using the Ansible user module:  ansible web -b -m user -a “name=johnd comment=”John Doe” uid=1040″

By running the job and reading the console output you can see Ansible in action and then update the job to learn more.

3_Run_Your_Adhoc_Command

This job is identical to the job above in terms of setting up an environment to run Ansible.  However instead of having the hard-coded ad hoc Ansible commands listed above, it allows you to enter your own commands when running the job.  By default it pings all nodes:

ansible all -m ping

4_Run_A_Playbook

This job is identical to the job above in terms of setting up an environment to run Ansible.  However instead of passing in an ad hoc Ansible command, it lets you pass in an Ansible playbook to also run against the nodes.  By default the playbook that gets run installs Apache on the web nodes and PostgreSQL on the db node.  Of course you can change this to run any playbook you like so long as it is set to run on a host expression that matches: web-node-1, web-node-2, and/or db-node (or “all”).

How the jobs 2-4 work

To understand exactly how jobs 2-4 work, the code is reasonably well commented and should be fairly readable.  However, at a high-level the following steps are run:

  1. Create the Ansible inventory (hosts) file that our Ansible Control Container will need so that it can connect (ssh) to our db and web “nodes” to control them.
  2. Build the Docker image for our Ansible Control Container (install Ansible like the first Jenkins job, and then add the inventory file)
  3. Create a Docker network for our pretend server containers and our Ansible Control container to all run on.
  4. Create a docker-compose file for our pretend servers environment
  5. Use docker-compose to create our pretend servers environment
  6. Run the Ansible Control Container mounting in the Jenkins workspace if we want to run a local playbook file or if not just running the ad hoc Ansible command.

Conclusion

I hope this has been a useful read and has clarified a few things about Ansible, ADOP and Docker.  If you find this useful please star the GitHub repo and or share a pull request!

Bonus: here is an ADOP Platform Extension for Ansible Tower.

Running the DevOps Platform on Microsoft Azure

As per my last post about GCE sometimes knowing something is possible just isn’t good enough.  So here is how I spun up the DevOps Platform on the Microsoft Azure cloud.  Warning thanks to Docker Machine, this post is very similar to this earlier one.

1. I needed an Azure account.

2. I logged into my Azure account and didn’t click “view the new Portal”.

3. On the left hand menu, I scrolled down to the bottom (it didn’t look immediately to me like it will scroll so hover) and clicked settings.  Here I was able to see my subscription ID and copy it.

4. (Having previously installed Docker Toolbox, see here) I opened Git Bash (as an Administrator) and ran this command:

$ docker-machine create --driver azure --azure-size Standard_A3 --azure-subscription-id <the ID I just copied> markos01

I was prompted to open a url in my brower, enter a confirmation code, and then login with my Azure credentials.  Credit to Microsoft, this was easier than GCE for which I needed to install the gcloud commandline utility!

You will notice that this is fairly standard.  I picked an Standard_A3 machine type which is roughly equivalent to what we use for AWS and GCP.

5. I waited while a machine was created in Azure containing Docker

6. I cloned the ADOP Docker Compose repository from GitHub:

$ git clone https://github.com/Accenture/adop-docker-compose
$ cd adop-docker-compose

7. I ran the normal startup.sh command as follows:

$ ./startup.sh -m markos01 -c NA

And entered a user name (thanks to this recent enhancement), hey presto

...
SUCCESS, your new ADOP instance is ready!
Run these commands in your shell:
eval \"$(docker-machine env $MACHINE_NAME)\"
source env.config.sh
Navigate to http://52.160.97.159 in your browser to use your new DevOps Platform!

And just to prove it:

$ whois 52.160.97.159 | grep Org
Organization: Microsoft Corporation (MSFT)
OrgName: Microsoft Corporation
OrgId: MSFT

8. I had to go to All resources > markos01-firewall > Inbound security rules and added a rule to allow HTTP to my server on port 80.

9. I viewed my new ADOP on Azure hosted instance in (of course…) Chrome! 😉

More lovely stuff!

 

Running the DevOps Platform on Google Compute Engine

Sometimes knowing something is possible just isn’t good enough.  So here is how I spun up the DevOps Platform on Google Compute Engine (GCE).

1. I needed a Google Compute Engine account.

2. I enabled the Google Compute APIs for my GCE account

3. I installed the Google Cloud commandline API

4. I opened the Google Cloud SDK Shell link that had appeared in my Windows Start menu and ran:

C:\> gcloud auth login

This popped open a Chrome window and asked me to authenticate against my GCE account.

5. (Having previously installed Docker Toolbox, see here) I opened Git Bash (as an Administrator) and ran this command:

$ docker-machine create --driver google \
                 --google-project <a project in my GCE account> \
                 --google-machine-type n1-standard-2 \
                 markosadop01

You will notice that this is fairly standard.  I picked an n1-standard-2 machine type which is roughly equivalent to what we use for AWS.

6. I waited while a machine was created in Google containing Docker

7. I cloned the ADOP Docker Compose repository from GitHub:

$ git clone https://github.com/Accenture/adop-docker-compose
$ cd adop-docker-compose

8. I ran the normal startup.sh command as follows:

$ git clone https://github.com/Accenture/adop-docker-compose
$ ./startup.sh -m markosadop01 -c NA

And hey presto:

...
SUCCESS, your new ADOP instance is ready!
Run these commands in your shell:
eval "$(docker-machine env $MACHINE_NAME)"
source env.config.sh
Navigate to http://104.197.235.64 in your browser to use your new DevOps Platform!

And just to prove it:

$ whois 104.197.235.64 | grep Org
Registrant Organization: Google Inc.
Admin Organization: Google Inc.
Tech Organization: Google Inc.

9. I had to go to Networks > Firewall rules and added a rule to allow HTTP to my server.

10. I viewed my new ADOP on Google instance in (of course…) Chrome!

Lovely stuff!

Start Infrastructure Coding Today!

* Warning this post contains mildly anti-Windows sentiments *

It has never been easier to get ‘hands-on’ with Infrastructure Coding and Containers (yes including Docker), even if your daily life is spent using a Windows work laptop.  My friend Kumar and I proved this the other Saturday night in just one hour in a bar in Chennai.  Here are the steps we performed on his laptop.  I encourage you to do the same (with an optional side order of Kingfisher Ultra).

 

  1. We installed Docker Toolbox.
    It turns out this is an extremely fruitful first step as it gives you:

    1. Git (and in particular GitBash). This allows you to use the world’s best Software Configuration Management tool Git and welcomes you into the world of being able to use and contribute to Open Source software on Git Hub.  Plus it has the added bonus of turning  your laptop into something which understands good wholesome Linux commands.
    2. Virtual Box. This is a hypervisor that turns your laptop from being one machine running one Operating System (Windoze) into something capable of running multiple virtual machines with almost any Operating System you want (even UniKernels!).  Suddenly you can run (and develop) local copies of servers that from a software perspective match Production.
    3. Docker Machine. This is a command line utility that will create virtual machines for running Docker on.  It can do this either locally on your shiny new Virtual Box instance or remotely in the cloud (even the Azure cloud – Linux machines of course)
    4. Docker command line. This is the main command line utility of Docker.  This will enable you to download and build Docker images, and turn them into running Docker containers.  The beauty of the Docker command line is that you can run it locally (ideally in GitBash) on your local machine and have it control Docker running on a Linux machine.  See diagram below.
    5. Docker Compose. This is a utility that gives you the ability to run and associate multiple Docker containers by reading what is required from a text file.DockerVB
  2. Having completed step 1, we opened up the Docker Quickstart Terminal by clicking the entry that had appeared in the Windows start menu. This runs a shell script via GitBash that performs the following:
    1. Creates a virtual box machine (called ‘default’) and starts it
    2. Installs Docker on the new virtual machine
    3. Leaves you with a GitBash window open that has the necessary environment variables set to instruct point Docker command line utility to point at your new virtual machine.
  3. We wanted to test things out, so we ran:
    $ docker ps –a
    CONTAINER ID  IMAGE   COMMAND   CREATED   STATUS   PORTS  NAMES

     

    This showed us that our Docker command line tool was successfully talking to the Docker daemon (process) running on the ‘default’ virtual machine. And it showed us that no containers were either running or stopped on there.

  4. We wanted to testing things a little further so ran:
    $ docker run hello-world
     
    Hello from Docker.
    
    This message shows that your installation appears to be working correctly.
     
    
    To generate this message, Docker took the following steps:
    
    The Docker client contacted the Docker daemon.
    The Docker daemon pulled the "hello-world" image from the Docker Hub.
    The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
    
    The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.
    
     
    
    To try something more ambitious, you can run an Ubuntu container with:
    
    $ docker run -it ubuntu bash
    
     
    
    Share images, automate workflows, and more with a free Docker Hub account:
    
    https://hub.docker.com
    
     
    
    For more examples and ideas, visit:
    
    https://docs.docker.com/userguide

     

    The output is very self-explanatory.  So I recommend reading it now.

  5. We followed the instructions above to run a container from the Ubuntu image.  This started for us a container running Ubuntu and we ran a command to satisfy ourselves that we were running Ubuntu.  Note one slight modification, we had to prefix the command with ‘winpty’ to work around a tty-related issue in GitBash
    $ winpty docker run -it ubuntu bash
    
    root@2af72758e8a9:/# apt-get -v | head -1
    
    apt 1.0.1ubuntu2 for amd64 compiled on Aug  1 2015 19:20:48
    
    root@2af72758e8a9:/# exit
    
    $ exit

     

  6. We wanted to run something else, so we ran:
    $ docker run -d -P nginx:latest

     

  7. This caused the Docker command line to do more or less what is stated in the previous step with a few exceptions.
    • The –d flag caused the container to run in the background (we didn’t need –it).
    • The –P flag caused docker to expose the ports of Nginx back to our Windows machine.
    • The Image was Nginx rather than Ubuntu.  We didn’t need to specify a command for the container to run after starting (leaving it to run its default command).
  8. We then ran the following to establish how to connect to our Nginx:
    $ docker-machine ip default
    192.168.99.100
    
     $ docker ps
    
    CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                                           NAMES
    
    826827727fbf        nginx:latest        "nginx -g 'daemon off"   14 minutes ago      Up 14 minutes       0.0.0.0:32769->80/tcp, 0.0.0.0:32768->443/tcp   ecstatic_einstein
    
    

     

  9. We opened a proper web brower (Chrome) and navigated to: http://192.168.99.100:32769/ using the information above (your IP address may differ). Pleasingly we were presented with the: ‘Welcome to nginx!’ default page.
  10. We decided to clean up some of what we’re created locally on the virtual machine, so we ran the following to:
    1. Stop the Nginx container
    2. Delete the stopped containers
    3. Demonstrate that we still had the Docker ‘images’ downloaded

 

$ docker kill `docker ps -q`

8d003ca14410
$ docker rm `docker ps -aq`

8d003ca14410

2af72758e8a9

…

$ docker ps -a

CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

$ docker images

REPOSITORY                     TAG                 IMAGE ID            CREATED             VIRTUAL SIZE

nginx                          latest              sha256:99e9a        4 weeks ago         134.5 MB

ubuntu                         latest              sha256:3876b        5 weeks ago         187.9 MB

hello-world                    latest              sha256:690ed        4 months ago        960 B

 

 

  1. We went back to Chrome and hit refresh. As expected Nginx was gone.
  2. We opened Oracle VM Virtual box from the Windows start machine so that we could observe our ‘default’ machine listed as running.
  3. We ran the following to stop our ‘default’ machine and also observed it then stopping Virtual Box:
    $ docker-machine stop default

     

  4. Finally we installed Vagrant. This is essentially a much more generic version of Docker-Machine that is capable of creating not just virtual machines in Virtual Box for Docker, but for many other purposes.  For example from an Infrastructure Coding perspective, you might run a virtual machine for developing Chef code.

 

Not bad for one hour on hotel wifi!

Kumar keenly agreed he would complete the following next steps.  I hope you’ll join him on the journey and Start Infrastructure Coding Today!

  1. Learn Git. It really only takes 10 minutes with this tutorial LINK to learn the basics.
  2. Docker – continue the journey here
  3. Vagrant
  4. Chef
  5. Ansible

 

Please share any issues following this and I’ll improve the instructions.  Please share  any other useful tutorials and I will add those also.