Why we created a new incident escalation platform
Published: Monday, 27 February 2023
👋 Hi, I am Mads Quist, and I'd like to share with you what motivated me to create All Quiet, a new platform for software engineering teams to collaborate on incidents.
The product I missed as an engineering manager
Working as an engineering manager, I was responsible for 5 teams. We were all working hard to build a great product. We built the whole product: the backend, the web frontend, and the Android app as well as the iOS app. Also, we fully adopted dev-ops, meaning we managed all the infrastructure ourselves. We didn't have a dedicated administrator or the like.
This was actually fun! Great freedom that came with great responsibility to keep our product up and running since it's used by many users. Downtimes would seriously affect our users because they were doing business through our platform.
Of course, we were monitoring our systems thoroughly, but whenever an alert was triggered, the collaboration and communication that followed was a bit chaotic.
Emails are overlooked
Our monitoring solution was only capable of sending emails or SMS. So when an alert was triggered, an SMS and an email were sent out to round about 30 people: 3 senior engineers per team, 2 product managers per team and a few more cross-team stakeholders.
As you can imagine, most collaborators were overlooking the emails since they were buried in the daily amount of other system and corporate emails. So email was not a very good reactive channel.
SMS, though, were VERY reactive. So once an alert was sent out as an SMS to all collaborators, the coordination chaos started.
Slack channels are too chaotic
Internally, we used Slack as our main communication channel.Non-engineering stakeholders asked in different slack channels if the issue was serious and who was looking into it. The engineers, on the other hand, were all assessing the severity and investigating the issue. Some of them at the same time.
Dedicated single channel
The result had to be communicated into the already open slack channels. It was not possible to react on the SMS channel, which means that every collaborator had to switch from SMS to one of our unofficial-official slack channels. We needed a dedicated channel for our alerts, with instant feedback on the current status.
SMS is too screamy with no ability to respond
The other problem with SMS was that it was very screamy. Especially after office hours, it was stressful for everyone to get SMS in the middle of your date night or family dinner. If you wake up to a screaming phone in the morning, it's quite the opposite of a relaxed start into your day.
Protect your team's off-hours
In all these cases, you had to double check slack channels to see if the problem was severe and if a colleague was already solving it. We needed some escalation policies to not wake up every single engineer and very single stakeholder all the time.
To sum it up, we needed a platform offering:
- Dedicated collaboration channel
- A channel with high attention without being too screamy
- Instant feedback on an incident's status
- Escalation policies
We looked into many products on the market but none of them were actually meeting our demands. So, I felt: I will build it myself.
Saturday, 18 March 2023
Productive teams stay calm; stressed teams struggle
😌 Why calm software engineering teams are more productive than stressed teams and how dedicated communication channels can help to foster this calmness.
Read all blog posts and learn about what's happening at All Quiet.
© 2023 All Quiet GmbH. All rights reserved.