Think of the most out-of-nowhere and surprising incident you've experienced. I mean the ones that you know will be told over and over for years because the story is so bananas. The stuff that made it possible for you and your colleagues to handle it...is resilience.
In this talk I'm going to describe what that "stuff" is, and then I'll talk about how it's incredibly hard to recognize it. A well-known and contrarian adage in the Resilience Engineering community is "Murphy's Law is wrong. What could go wrong almost never does, but we don't notice that — we just call it 'normal work.'" I'd like to help you understand the relationships between resilience, resilience engineering, learning from incidents, incident analysis, and other topics in the hope that you can see what a small (but fast-growing) community already sees...and cannot unsee.
Interview:
What's the focus of your work these days?
My colleagues and I, are a small group, that for the last five years has been bringing methods, approaches, techniques, and really concepts from fields that study complex work and expertise to the world of software. My master's degree is in human factors and system safety; my background is in software. Prior to this job, I was CTO at a company in Brooklyn called Etsy. The fields that I'm referring to are fields like human factors, cognitive systems engineering, and resilience engineering. In my talk, I try to explain what those fields, in particular, the field of resilience engineering, have come to understand, in as accessible a way as I can.
It's a field that is growing in interest in the software world. The field is a little over twenty years old, and only recently has the world of software come to it. So where that ends up is a lot of the work that we do either focuses on or is adjacent to incident analysis and understanding how people handle complex and, in many cases, surprising and unanticipated situations.
What's the motivation for your talk at QCon New York 2023?
I firmly believe that the industry is at the beginning part of understanding and exploring the contribution that people make to their work. People are the only adaptive element in your organization. It is really tempting to believe that you can build in some automation that can "think" or do some of your work for you, but that's not what research in real-world environments like power plants and aviation and medicine and space travel say. Genuine adaptation, to say things that cannot be anticipated, to react to unforeseen events - people are the greatest strength in those situations. Things work because people are bridging a gap between what the software, the application, and automation are designed to do and what reality challenges it to do, and we do it so well.
How would you describe your main persona and target audience for this session?
The only criteria for attendees to get something out of the session is to have experience either currently or in the past of hands-on Practitioner hands-on production.
Is there anything specific that you'd like people to walk away with after watching your session?
A couple of things. The first is that at every moment where the design or the modification, the recalibration of software that helps them do their work includes or say fueled by that act. The support from actual work that they do. I don't mean this in a user or just like a narrow user experience way. But in a cognitive work way, and so I'm going to sort of arm people with a handful of heuristics that I would want them to have in mind when they're approaching their work.
The other is, as I mentioned before, I believe the industry is on a precipice, a precipice that to me is incredibly similar to the 2008 to 2010 period of time where continuous deployment and delivery, the idea of DevOps. Having been there and understood that, born out of practitioners' grassroots recognition about how software could fundamentally be changed in the way it's designed and operated, I believe that same Paradigm shift is happening right now, and this talk is part of laying out the support that brings me to that conclusion.
What's something interesting that you've learned from the previous QCon?
What sticks out for me about QCon is certainly the interesting and insightful nuggets from the talks. However, what I remember just as much are the connections. When I've been to the conference, whether I'm speaking or attending, connections between what would otherwise on the schedule look like parallel but separate topics. It's hard for me not to see a through line of connections between and across topics, and that's something that sticks out to me at QCon. You just can't get that from just looking at the schedule.
When you're in the sessions, you go to one session. You think, "Oh, this is about ABC." And in the afternoon, "Oh, this is about apples and oranges." As you're in the "apples and oranges" session, there are things about the "ABC" session that connect in my mind, with the apples and oranges one, and I think that that's a strength that QCon has.
Speaker
John Allspaw
Founder and Principal @Adaptive Capacity Labs
John Allspaw has worked in software systems engineering and operations for over twenty years in many different environments. John’s publications include the books The Art of Capacity Planning (2009) and Web Operations (2010) as well as the forward to “The DevOps Handbook.” His 2009 Velocity talk with Paul Hammond, “10+ Deploys Per Day: Dev and Ops Cooperation” helped start the DevOps movement. John served as CTO at Etsy, and holds an MSc in Human Factors and Systems Safety from Lund University.