The Challenges of “Observability”

Dan Ravenstone
6 min readJan 10, 2024

In the past year, I have come across three distinct areas where the concept of “observability” has proven to be challenging.

  1. Mobile Applications
  2. Artificial Intelligence and Large Language Models
  3. Culture

If you notice, I put quotes around observability and when I get to the section on culture, I hope to explain why. What I find interesting is that despite the challenges of monitoring, where it always was an afterthought, a knee-jerk reaction, or pushed to the side because of the costs incurred and not understanding the value it brought to organisations’ bottom line, folks like me developed this twisted passion for it. Then leadership and management attitudes towards monitoring improved. We also began to see tooling that addressed specific problems. The way companies built their service offering became more elaborate and our ability to identify where bottlenecks lay or faults occurred, was becoming increasingly challenging. All-in-one solutions with out of the box solutions just didn’t quite provide the ever growing complexity that was emerging. Observability has become the new buzzword, much to the chagrin of us purists. That aside, with the three observability challenges that I list, the first seems largely ignored, the second changes how we think of observable systems, and the third focuses on the culture surrounding observability.

Mobile Applications

If you search on “mobile application monitoring”, the results are disheartening, at least in my humble opinion. Aside from the sponsored results that permeate the first 5 or 10 suggestions, the rest are centred around paid SaaS solutions (which are not so cheap). In a world where analytics is crucial to understanding the user experience, there is a massive need to obtain metrics from mobile applications to ensure that product managers, data analysts and developers understand what is working well and what isn’t. There has been progress with OpenTelemetry to add support for SDK’s like Swift and Android (based on the Java SDK) but there are frameworks that make this very difficult to work with (like Flutter). It’s just not this though, in the last two roles I have been in, the Mobile team has been largely ignored when it came to standard monitoring support. No talking about best practices and the principles that should be applied. It was like the Mobile team was that cousin no one ever spoke about AND hoped would never come to the family reunion. To be perfectly frank, this section is more of a cry for help than anything else. I am hoping that the monitoring and observability community will come and correct my thinking and point me and my colleagues in the right direction. “Look! Here are the answers you seek!”.

Artificial Intelligence and Large Language Models

Where mobile apps have been ignored, this particular field has grown by leaps and bounds in the past year. In the past, there has been some work dedicated to leveraging machine learning to identify and relate event anomalies that occur in the service, allowing engineers to proactively address potential issues and prevent significant downtime. Then in 2015, along came OpenAI. No need to get into the full story about that, suffice it to say that in November 2022, OpenAI released a demo that has caught on like wildfire. Now it seems everyone wants to enable some form of this concept into their service offering. This is great, except, how do we make the interactions observable without compromising security, personal data, and ensure that the user experience is, for lack of a better word, good?

A couple of weeks ago, I was involved in a “hack-a-thon” through one of our partner companies where I learned that I know nothing about the challenges of AI and Large Language Models. I spent hours researching articles and books on the subject so I could quickly get up to speed. One thing that stood out for me was a quote from Charity Majors in one of her Honeycomb blogposts where she stated*:

LLMs are black boxes that produce nondeterministic outputs and cannot be debugged or tested using traditional software engineering techniques. Hooking these black boxes up to production introduces reliability and predictability problems that can be terrifying. It’s important to understand this, and why.

When you read further, it becomes clearer that the challenges of LLM are vastly different. We simply can’t apply the same logic from before. It’s not just a matter of measuring latency requests, successful transactions versus failed, or even how we can apply our preexisting concepts of service level objectives. The quality of the service goes beyond that and relies on whether there are biases in the responses, are they predictable, accurate, precise, and appropriate? How the heck do we measure that?? That is a rhetorical question. There are tools fortunately that have been developed (some are quite new); however, this is where we should be cautious. If you are about to embark on this adventure, don’t throw all your preconceptions of observability out the window, but be prepared to add to them as you progress. Don’t assume that simply having logs, metrics, and traces with conventional open-source (or SaaS provider) tools will suffice. That only will paint half the picture. If you truly consider observability as the core foundational model that you wish to adhere to, you will have to expand your knowledge and tooling.

Culture

Years before “observability” was a buzzword of sorts, monitoring was beginning to gain momentum as a serious practice. Companies were starting to invest and take a serious look at service performance and the impact issues and downtime had on their customers. Monitoring professionals used concepts like the USE method in concert with the RED method, whereas others preferred the four golden signals from the Google SRE handbook. All useful starting points. Some even got in as much to define SLIs and SLOs. We were slowly getting away from SNMP and along came a concept that has been around for much longer than any of us have been alive, namely telemetry. This offered us an opportunity to understand traffic patterns, behaviours, and application performance in a much more succinct way, something I think many of us, at the time, were grateful for.

Next came the concept of observability and its three pillars; logs, metrics, and traces. This was great for those of us who understood what this meant, but over time, things got hazy, confusing, and misunderstood. Over time, thanks to our incessant need to employ “buzzwords”, monitoring and observability became synonymous and interchangeable where they were clearly not. Recently, I got so tired of explaining that they are not the same, I created this short video to simply explain the differences. Let’s take a step back for a moment and focus on the key word pillar. What exactly is a pillar? In this context, it would be safe to say a pillar is a tall vertical structure used as a support for a building or more appropriately, a concept. Therein lies the rub. Product and engineering alike assume that when they set up logs, metrics, or traces, they claim, “We have done it! We have observability! Check that box and onto the next’’. This has proven to be a challenge to adjust that misaligned rationale when it’s so ingrained in their thinking. It’s not anyone’s fault. We have been inundated with so many “best practices” and conjectures on different fronts, it’s hard to keep up. There are discussions like DevOps vs SRE, Agile vs Scrum, or even, crazy as it seems, DevOps vs Agile. It just gets overwhelming. The select few who deeply care about observability and have made it their specialty, let’s be honest, a somewhat small group in this world, know the value proposition. We know the level of effort required to mature an observability model and it feels like an uphill battle that keeps growing. The hill that is, and sometimes the battle itself.

This is perhaps more of an observation of some of the issues I have been noticing in the world of observability recently and sadly, I don’t really have any solutions to present. At the end of the day, if you are like me and notice similar patterns or challenges, know that you are not alone. I am hoping that some companies (SaaS or maybe in the open-source community) seriously consider developing better tooling for the mobile experience. With AI, be cautious, there is much to learn. Plan ahead, consider some of the potential hurdles and pitfalls ahead and don’t treat it the same as your other services. Every organisation has a different culture about how observability is embraced and the maturity varies. If you are struggling to get just the core concepts across to leadership because the media or some SaaS vendor is selling them on a turn-key solution that “just works out of the box” and you know that is simply not the case. Have no fear, we have been there. There are tonnes of resources and folks out there to help you plead your case and hopefully start to instil best practices and thinking. Until next time, eat cake.

--

--

Holistic o11yneer detective, skater, and lomographer. Still trying to leverage ADHD to his benefit & finish what has been started