Report: GigaOm Radar for Cloud Observability
Summary
Observability is an emerging set of practices, platforms, and tools that goes beyond monitoring to provide insight into the internal state of systems by analyzing external outputs. It’s a concept that has its roots in 19th century control theory concepts and is rapidly gaining traction today.
Of course, monitoring has been a core function of IT for decades, but old approaches have become inadequate for a variety of reasons—cloud deployments, agile development methodology, continuous deployments, and new DevOps practices among them. These have changed the way systems, infrastructure, and applications need to be observed so events and incidents can be acted upon quickly.
At the heart of the observability concept is a very basic premise: quickly learn what happens within your IT to avoid extended outages. And in the unfortunate event of an outage, you need to ensure that you can get to the root cause of it fast. Outages are measured by Mean Time To Resolution (MTTR) and it is the goal of the observability concept to drive the MTTR value to as close to zero as possible.
No surprise, building resilient service delivery systems that are available with high uptime is the ultimate end goal for any business. Achieving this goal requires executing three core concepts:
- Monitoring: This is about understanding if things are working properly in a service-centric manner.
- Observability: This is about enabling complete end-to-end visibility into your applications, systems, APIs, microservices, network, infrastructure, and more.
- AIOps: This is about using comprehensive visibility to derive meaning from the collected data to yield actionable insights and courses of action.
To achieve observability, you need to measure the golden telemetry signals—logs, metrics, and traces. Logs and metrics have been measured by IT professionals for decades, but traces is a fairly new concept that emerged as modern applications increasingly were built using distributed microservices. A service request is no longer completed by one service but rather by a composition of microservices, and as such there is an imperative to track or trace the service request from start to finish. In order to generate proper telemetry, all the underlying systems must be properly instrumented. This way enterprises can achieve full visibility into their systems to track service calls, identify outages, and determine if the impacted systems are on-premises, in the cloud, or somewhere else.
Observability is not always about introducing new tools, but about consolidating the telemetry data, properly instrumenting systems to get the appropriate telemetry, creating actionable insights, and avoiding extended outages. Comprehensive observability is core to future proofing IT infrastructure.
This report evaluates key vendors in the emerging application/system/infrastructure observability space and aims to equip IT decision-makers with the information they need to select providers according to their specific needs. We analyze the vendors on a set of key criteria and evaluation metrics, which are described in-depth in the “Key Criteria Report for Cloud Observability.”, co-written by myself and David Linthicum.
You can access the full report here if you are a GigaOm subscriber or client. Still interested? Reach out to me, I will see what I can do to get you a copy of the report.