Here are some highlights of Devops Days London November 2013 – Day 1 that I wanted to share.
Mark Burgess gave an interesting talk based on his new book In Search Of Certainty. To those that don’t already know Mark, he was the creator of CFEngine which is father of Puppet and the grandfather of Chef. CFEngine began 20 years ago, so it is very interesting for those of us who have only a couple of years working with active Configuration Management tools to learn intricacies from someone who has been evolving them and observing them for 20 years.
Mark started off challenging the notion that “consistency equals quality” recognizing the almost organic nature of highly complex systems. All in all, the talk was fairly challenging stuff to start the day with…
Mark ended his talk with a sobering thought that the human race is becoming so dependent on complex IT systems that at some point our whole prospects of survival will depend on them. This left me thinking, it’s hard enough using DevOps techniques to delivery more continuously, did he have to raise the stakes so much higher!?
Doug Barth from Pager Duty did a great talk called Failure Fridays, which also could be called Chaos Monkey for Mortals (i.e. artificially creating failure for testing when you are not NetFlix). It was a great session because he really explained the in’s and out’s of what they do, which is basically spending one hour per week (in total) systematically breaking, observing and fixing their system (only creating failures at a scale that they expect to be able to tolerate without impacting their customers).
Where I work, system delivery projects include formally testing all of these things up front and we call it Operator Acceptance Testing. However, I definitely like the idea of it being an on-going activity (certainly resonates with the “if it’s difficult, do it often mantra”).
Practical considerations of doing it weekly with the live system included:
- constrain to one hour (end to end)
- involve of all impacted parties all in one room (chat/video conference room if necessary)
- lots of planing and careful assessment of risk before hand
- careful what you disable:
- disabling auto-healing mechanisms = probably good
- disabling monitoring and alerts (things you want to test) = bad
- effective communications before during and afterwards (this is not like a surprise fire drill)
- starting small e.g. one server and scaling up to a bank
Doug also described other virtues of this activity such as it creating a great opportunity to train new team members and the effect that it had on increasing focus on failure with developers. I’d bet that at least half the people in the audience went away wanting to implement Failure Fridays and many of them will actually do that. Just not on a Friday though – why jeopardize Saturdays?
Ben Hughes from Etsy did the next talk about Security and it was a very lively, entertaining and informative session. Rather than try to do it justice, instead I just highly recommend watching the video of his 5 minute ignite from DevOpsDays Portland. However just one quote I can’t resist sharing here (credited to Ben’s boss apparently)
“If you go down from 35% on fire to 24% on fire, you are still on fire”
– referring to issues that you have to accept exist and manage rather than trying to avoid / deny the possibility of happening.
The afternoon Open Sessions of course delivered some lively debate. It was particularly interesting to see some early examples of discussions about Docker versus Configuration Management. A debate that is likely to rage!
All in all, an excellent first day (even without needing to mention the free bar in the evening)! Thanks DevOpsDays!