Choosing the right stack for cloud observability

Migrating to the cloud

For most organizations, migrating to the cloud is a significant step that often requires reshaping the application and infrastructure monitoring stack. This can be a complex task, with several challenges to overcome to achieve true observability. Understanding and addressing these challenges is crucial for a successful cloud strategy.

Interested in reading about the TOP 5 mistakes organizations make when switching to Cloud? Then, read this article.

Understanding the importance of observability strategy

Observability means more than monitoring and alerting. It should offer a complete view of your infrastructure and application, analyzing how each component interacts with each other by correlating metrics, logs, and traces.

It is not uncommon for observability to be either completely left out or put at the lower end of the priority list when creating the strategy for cloud migration. This can lead to:

Over/under-provisioning the infrastructure

Inability to get application insights and identify possible bottlenecks early on

Negatively affecting the ability of developers to optimize project code

Insufficient project budget by not considering observability of real costs

Lack of resources and expertise to handle complex observability stacks

Less time is allocated to properly choose between competing tools, which in turn can lead to frustrations and increased costs while trying to find the best fit

Identifying Specific Requirements

To start understanding your specific needs and objectives, you need to consider the following factors:

Size and complexity of the infrastructure

Types of applications to be deployed

Compliance requirements

Flexibility and customization

Team’s skill set

One of the biggest mistakes we often see is deploying a monitoring solution with default configurations, expecting it to meet all observability needs. Most of these configurations focus on system metrics rather than application insights. While system metrics are useful, they are only a small part of a comprehensive solution.

For instance, one of our clients faced significant issues after moving to the cloud. Despite having lots of metrics, none were useful for troubleshooting the application stack.

So, we collaborated on developing and implementing a tailored observability stack suited to their specific needs. This involved separating each component of the stack, gathering specific metrics for each, and then correlating them to make it easier to identify the root cause of issues.

Photo by Jukan Tateisi on Unsplash

Identifying Skill Shortage and Engineering Overheads

Most organizations face a real challenge due to a lack of observability skills. That’s why it is important to identify such a problem early so that you can include it in your project strategy.

Imagine this scenario: you invest in a top-tier observability tool packed with features and integrations that perfectly fit your cloud infrastructure. You also get paid support and hands-on help with the initial setup. But once you’re in the day-to-day operations, you realize your team lacks the skills to manage such a complex system. You end up calling support more often than you’d like. Over time, the solution becomes a heap of unused features.

We encountered this situation firsthand, and it was deeply frustrating. Without the right experts to manage and interpret the data, we couldn’t unlock the full potential of the tool, which meant we didn’t experience its true value. Instead of enabling proactive monitoring, it left us constantly reacting to issues. This experience taught us that having the best tools is only part of the equation; they need to be user-friendly and manageable by your team to truly be effective.

Ensuring Seamless Integration

Each tool from your observability stack should be easy to integrate with your existing ecosystem of tools and technologies.

Evaluate the tool’s compatibility with popular cloud platforms, container orchestration systems, and other monitoring frameworks, such as OpenTelemetry, which can shape the future of observability.

Being locked into closed platforms might hinder innovation while not being able to take advantage of the various open-source projects.

We try to avoid vendor-locked-in solutions, such as Azure Monitor or Amazon CloudWatch. While these tools aren’t necessarily bad, our experience has shown they can sometimes feel inflexible for our specific needs. Additionally, when moving out from a public cloud provider, relying on these closed platforms may require a complete monitoring re-design. Many clients we work with (or potential ones) consider Azure Monitor as their default choice when migrating to Azure, but we believe effective observability requires a more comprehensive approach.

Photo by marcos mayer on Unsplash

Analyzing Cost and Licensing Structures

This can play a huge role in decision-making, especially if your organization has budget constraints. You should compare all pricing models, including those that are subscription-based, usage-based, or free tiers, and include other costs such as storage or computing.

Factors to consider:

Cloud solution or self-hosted. For self-hosted, it is important to determine the additional costs for computing resources and maintenance.

Price transparency. The aim here is to have predictable bills and avoid hidden costs.

Data retention policies and storage costs. Check the options for both short- and long-term retention.

We usually recommend relying more on metrics than logs (where possible) to avoid high storage usage and costs. If really needed, they should be selectively collected and properly parsed, as most of the logs do not really follow a standard (a multiline log, for example, can generate hundreds of documents instead of just one).
For instance, one of our local government clients requires long-term retention (years) for audit events and specific logs due to legal requirements. For this case, the go-to solution was Elasticsearch, which is a popular and effective tool for storing logs. Coupled with elastic-agent, we were able to parse and store logs in a compact, ECS standard format. This way, we reduced the storage volume needed and decreased the overall costs. Elasticsearch also has some other features that can be leveraged for long-term retention, such as warm and cold data tiers.

Assessing the level of support and the long-term future

Evaluate the vendor’s documentation, knowledge base, forums, and user groups to understand the level of community engagement and support available. Additionally, consider the vendor’s track record in delivering timely updates, patches, and feature enhancements to address issues and feedback from users.

Essential to choose the right tools

Choosing the right tools for cloud observability is essential for your organization. It’s not just about monitoring; it’s about gaining a clear view of your entire system. By focusing on what your team needs, the skills you have, and how well tools work together, you can make a smart choice.

Keep it simple and practical. Look at the costs, make sure you understand them and choose tools that fit your budget. Consider the long-term support and updates from the vendors. This will help you avoid surprises and ensure you have reliable tools.

In the end, the right observability stack will help you catch problems early, optimize performance, and support your team’s efforts. By taking a thoughtful approach, you can find a solution that fits your organization and helps you succeed in the cloud.

If you’re unsure where to start or need further guidance, feel free to reach out to us. We can support you on your cloud observability journey.

By Adrian Marcu
Senior DevOps Engineer