Amazon ReInvent – game-changer or just another show in town?
AWS always used to come across as a landing place to attract the digital innovators to experiment, innovate and then productionize. They always had a good story attracting the bleeding edge innovators. This time I felt they missed that beat a little. Overall, it came across as less innovative and more incremental to what they already have. No earth-shattering new initiatives that blew me away. Could be because they wanted to play it safe with the change at the helm. What do you think?
An up-close and personal view of Fresh conference Refresh 2021
I was fortunate enough to be invited to attend and speak at the Refresh 2021 conference in Las Vegas earlier this month. This blog is my review of the the conference Refresh 2021.
Crisis/Incident Management in the Digital Era
When it comes to crisis and incident management in the cloud/digital era, HOPE IS NOT A STRATEGY! A properly setup Incident Management process should identify the incidents, provide you with Root Cause Analysis (RCA), propose possible fixes, and escalate the issue to the right SRE, DevOps, SME in a matter of minutes.
Report: Data Done Right for AIOps with RDA
Most of the AIOps companies are doing the process right, some use AI and ML properly, but most fail on how to automate data processing, or DataOps, on how to get the right data to AIOps tools at the right time. In this eBook "Data Done Right for AIOps," I discuss this in detail and offer some possible solutions including Robotic Data Automation (RDA).
Edge visibility & architecture chat with Mark Thiele, CEO, Edgevana.
I am very honored to be part of the Edgevana podcast series talking to the legendary Mark Thiele on various edge, AI, AIOps, total observability at edge, and other related topics.
AIOps Has a Data(Ops) Problem
Modern complex systems are easy to develop and deploy but extremely difficult to observe. Their IT Ops data gets very messy. If you have ever worked with modern Ops teams, you will know this. There are multiple issues with data, from collection to processing to storage to getting proper insights at the right time.
Report: Observability deep dive report for Zebrium
Summary I did a deep dive vendor research report on Zebrium which specializes in automatic root cause analysis using machine leaning. Quick summary from the report: Zebrium is an Observability/AIOps platform that uses unsupervised machine learning to auto-detect software problems and automatically find root causes, reducing manual labor and speeding […]
Achieving Reliable Observability Part 1 – Making Cloud-Native Observability More Robust
I was having a conversation with a CxO level customer as part of an AIOps/Observability workshop, and from what I could tell, most are confused about how to properly operationalize cloud-native production environments – especially the monitoring/observability portion. Here is how the conversation went.
What is AIOps? – AI for IT operations explained
Every business now depends on IT. Efficient IT Operations is mandatory for all businesses, especially those operating in a hybrid mode – a mix of existing data centers and multi-cloud locations. As with any business process, IT operations can be augmented with machine learning-based solutions. IT is particularly fertile ground for AI as it is mostly digital, has seemingly endless processes requiring automation and there are gigantic amounts of data to process.
Report: GigaOm Radar for Cloud Observability
Summary Observability is an emerging set of practices, platforms, and tools that goes beyond monitoring to provide insight into the internal state of systems by analyzing external outputs. It’s a concept that has its roots in 19th century control theory concepts and is rapidly gaining traction today. Of course, monitoring has been […]