Observability and security have come to the forefront of IT service delivery, a convergence that was long overdue. This was the urgent theme of the 2022 Splunk conference in Las Vegas.
The latest Atlassian outage goes to show that every cloud provider is prone to unplanned downtime sooner or later. While every company strives to achieve that unicorn status of zero downtime, it is almost impossible to achieve that in the face of “Unknown Unknowns.” I analyze it and offer some solutions on how to mitigate that if disaster strikes you.
I was fortunate enough to be invited to attend and speak at the Refresh 2021 conference in Las Vegas earlier this month. This blog is my review of the the conference Refresh 2021.
When it comes to crisis and incident management in the cloud/digital era, HOPE IS NOT A STRATEGY! A properly setup Incident Management process should identify the incidents, provide you with Root Cause Analysis (RCA), propose possible fixes, and escalate the issue to the right SRE, DevOps, SME in a matter of minutes.