Although Cassandra—an open-source NoSQL database—is renowned for its use in high-scale environments, Cassandra monitoring is a constant headache for many Devs. You wouldn’t blame Cassandra—impeccably handling such huge volumes of data must come at a cost.
A key cause of this headache: there are too many disparate components of the Cassandra architecture, so the metrics are so abundant you can easily lose sight of what matters.
This article is a deep dive into the key metrics that matter in the Cassandra ecosystem and the top ten tools that can help you simplify Cassandra monitoring.
Top Cassandra monitoring metrics
When you observe the following key metrics, you will gain insight into the activities within your Cassandra cluster and make informed decisions for optimal system reliability.
1. Dedicated Cassandra performance metrics:
These metrics describe how the Cassandra system and its components work.
a. Nodes
Nodes are individual servers that work together to store and manage data in a Cassandra cluster. Monitoring the status of each node helps you achieve a balanced workload distribution. It helps you know when a node is down, slowing down, or functional and when to repair or restart the node to avoid pressure on other nodes.
b. Request count
This metric tracks the number of requests being made to the Cassandra cluster, on the requests per seconds basis.
c. Client error request count
This metric identifies the number of client requests that result in errors and helps assess client-side issues, which could be unavailability of data, data inconsistency, or network error.
d. Read and write performance metrics
This includes metrics like average range slice latency, average read latency, and average write latency. While showing how much time it took certain requests to be executed, these latencies help you know timely issues that may arise as a result of the increased load or the number of compaction.
2. Operating system metrics:
These metrics provide information regarding the container or virtual machines:
a. CPU utilization
Monitoring CPU utilization helps gauge the level of computational resources being used by the Cassandra processes across the cluster or individual nodes. Spare CPU cycles indicate the node’s capacity for handling data and queries.
The division of CPU usage between user and wait pinpoints specific needs or bottlenecks in I/O or network. It helps you consider the impact of garbage collection on CPU cycles when planning resource allocation.
b. Memory usage
Keeping track of memory usage helps you ensure the system has enough memory to efficiently handle data processing. Since the latest Cassandra version utilizes both off-heap and heap memory, it is crucial to configure the heap size of your Cassandra nodes accurately and ensure sufficient off-heap memory for optimal cluster performance.
c. Disk usage
Monitoring disk usage provides visibility into storage capacity and potential disk space constraints that could impact Cassandra’s performance and data storage. It helps you determine whether a node is keeping up with compaction or falling behind. Compaction Metrics can give you deeper insights into this.
Compaction merges smaller data instances or SSTables and removes outdated data. Monitoring compaction counts and compacted data volume is important to maintain sufficient disk space for ongoing performance. You can set alerts for low disk space to avoid compaction failures.
3. Java Virtual Machine (JVM) metrics:
These metrics provide details about the environment where Cassandra is executed and running:
a. Heap memory usage
Heap memory is where Cassandra stores data, and excessive memory allocation or inefficient memory management can lead to performance degradation and even out-of-memory errors.
Keeping a close watch on heap memory usage helps you understand how Cassandra is utilizing memory resources and reveals potential memory-related issues that might affect the system’s stability.
b. Garbage collection
Garbage collection helps reclaim memory that is no longer in use by the application. It enables you to prevent excessive pauses that could impact the database’s responsiveness.
Monitoring garbage collection metrics, such as frequency, duration, and efficiency, helps you optimize garbage collection settings to minimize disruptions and ensure that the JVM efficiently manages memory resources.
c. Thread counts
Thread counts are the number of active threads within the Java Virtual Machine. High thread counts indicate heavy processing and potential contention, while low thread counts might suggest an underutilization of available resources.
These metrics help you understand the level of concurrent activity and resource utilization within your database. With the metrics, you can gauge the workload on the system and fine-tune resource allocation.
Top 10 Cassandra Monitoring Tools
While several Cassandra monitoring tools abound on the market, these ten have proven their mettle:
1. Middleware
Middleware is a full-stack cloud observability platform with dashboards that offer end-to-end visibility. Its integration with Cassandra is seamless, providing you with a single dashboard where you can observe all key metrics explained earlier.
Check the Middleware Cassandra documentation and find out how easy and fast it is to monitor the Cassandra cluster:
Features
- Automatic correlation of metrics.
- Offers metric segmentation by allowing you to observe node metrics and cluster metrics.
- Causation to correlation feature that spots when the problem begins and where it ends.
- Offers enterprise-level security and complies with GDPR, CCPA, and SOC 2.
- Provides 24/7 technical support.
Pricing
Middleware offers both free and paid plans, with a pricing model for the modern, scalable architecture. The pricing is charged based on your usage, with no limitation on features.
Start monitoring your Cassandra database with Middleware for free.
2. Sematext
Sematext also offers a comprehensive monitoring solution that simplifies Cassandra monitoring by consolidating metrics, logs, dashboards, and alerts in one platform. It provides you with the tools needed to troubleshoot Cassandra node performance and health.
Sematext enables the identification of slow nodes and issues with reads and writes and monitors SSTable compactions.
Features
- Powerful dashboarding capabilities for graphing diverse data.
- Auto-discovery of services for hands-off auto-monitoring of Cassandra clusters.
- Ability to observe Cassandra metrics in microservices and containerized environments, including Kubernetes and Docker.
- Alerting with anomaly detection and support for external notification services like PagerDuty, OpsGenie, and WebHooks.
Pricing
With Sematext, you only pay for what you’ve used based on the number of monitored hosts or containers. There’s a limited free plan available for both metrics and logs, and paid plans start at $3.6/month for metrics and $50/month for logs.
3. New Relic Cassandra Monitoring
New Relic provides a comprehensive monitoring of database, JVM, and operating system metrics, offering full visibility into the Cassandra cluster and its execution environment.
With its alerting system and query language, New Relic equips you with all the necessary tools to manage and monitor Apache Cassandra-related aspects.
Features
- Log centralization and analysis for both Apache Cassandra metrics and logs.
- Application performance monitoring with dashboarding.
- Integrated alerting with anomaly detection.
- Seamless monitoring of Apache Cassandra on major cloud providers like AWS, Azure, and Google Cloud Platform.
Pricing
You can choose between user-based and data-based pricing, and the plans determine the available features and the data limits without additional fees.
Switch to Middleware from New Relic in just few clicks.
4. Datadog
Datadog is a monitoring tool that works with various platforms and databases, including Cassandra. Through advanced tracing and latency breakdowns, Datadog streamlines monitoring for Cassandra and offers detailed insight into slow-running queries and error rates.
It provides a comprehensive suite of metrics on query throughput, execution performance, and resource usage. Datadog enables navigation between logs and metrics and the setup of anomaly-based alerts for proactive issue prevention.
Features
- Monitoring of Cassandra application performance with support for distributed tracing.
- API for working with data, tags, and dashboards.
- Ability to analyze logs and ship Cassandra logs for correlation with metrics.
- Collaboration tools that allow for team-based and cross-sectoral discussions.
Pricing
Datadog’s pricing is based on features, hosts, and usage volume. It can be billed annually or on-demand, and the on-demand option makes Datadog 17-20% more expensive than the annual pricing.
5. ManageEngine Applications Manager
ManageEngine Applications Manager enables a centralized view of nodes within the Cassandra cluster. It collects crucial statistical data, including key Cassandra metrics such as memory usage, thread pool task statistics, storage utilization, CPU performance, operation latency, and pending tasks.
Its comprehensive insights reach live, leaving, moving, joining, and unreachable nodes. Sematext closely tracks memory consumption and provides alert notifications for high CPU usage and hardware-related performance issues.
Features
- Monitoring of Apache Cassandra with numerous additional integrations.
- Seamlessly monitors Cassandra running in Docker or Kubernetes.
- Alerting engine with notifications support.
Pricing
The pricing is determined by the version of ManageEngine Applications Manager you choose, that is, between Professional and Enterprise.
6. Dynatrace
Dynatrace streamlines Cassandra monitoring by automatically detecting essential metrics such as CPU usage and garbage collection times.
Once enabled globally, it seamlessly collects metrics upon detecting a new host running Cassandra across the entire environment, requiring no manual setup.
With its plug-and-play functionality, you can promptly optimize your Cassandra database, while its charting functionalities, like the “Exceptions and Failed requests,” enable you to track metrics and identify potential performance issues.
Features
- Monitoring Apache Cassandra performance with dashboards, as well as code-level tracing.
- First-class log analysis support with automatic detection of common application log types.
- Diagnostic tools for memory dumps, exceptions, and CPU analysis to aid in Cassandra troubleshooting.
- Integrations with Docker, Kubernetes, and OpenShift for simplified Cassandra monitoring.
Pricing
The pricing structure is feature-based. Application performance monitoring is linked to the number of hosts and the memory available on each host. The price is calculated based on the number of host units per hour. Pricing for the logs aspect is, however, based on the volume.
7. Prometheus & Grafana
Prometheus and Grafana form a robust open-source combination, offering flexibility in backend monitoring. This setup allows monitoring of metrics beyond just Apache Cassandra.
With numerous integration configurations for both Prometheus and Grafana, constructing an observability platform for Cassandra and its environment is seamless.
So while Prometheus collects Cassandra metrics, you use Grafana dashboards to visualize and explore the metrics.
Features
- A highly dimensional datastore implementation for gathering a wide range of Apache Cassandra metrics.
- Efficient time-series storage.
- A comprehensive dashboard with graphing features.
- Out-of-the-box alerting based on Prometheus query language to create alerts for Cassandra metrics.
Pricing
Although it is free, you’ll need to pay for the maintenance and storage of your metrics.
8. AppDynamics APM
AppDynamics automatically discovers and displays your Cassandra metrics on its dashboard. With its out-of-the-box configurations, you can analyze your transactions through graphs and charts.
AppDynamics captures detailed information that allows users to delve deeply into the components that initiate Cassandra backend calls. The information shown may include a call graph where users can inspect the actual calls and timings of each call within their application code.
Features
- A full-stack monitoring that offers visibility into Cassandra’s top-level transactions and backend calls.
- Comprehensive infrastructure monitoring that encompasses network components, databases, and servers.
- Anomaly detection that reports with root cause analysis.
- Alerting functionality with email templating and periodic digest capabilities.
Pricing
Pricing is determined based on the features needed from the platform and the agents required. For instance, accessing vital CPU, memory, and disk metrics necessitates selecting the APM Advanced plan.
9. SolarWinds
SolarWinds offers comprehensive monitoring for Cassandra clusters and is an ideal on-premise monitoring solution for Microsoft Windows.
It provides visibility into Apache Cassandra metrics, along with Windows and Linux metrics based on the chosen environment.
With built-in alerting and dashboarding capabilities, SolarWinds is well-suited for monitoring both Cassandra and the execution environment.
Features
- Support for Microsoft Windows environment, inclusive of Cassandra monitoring.
- Built-in intelligent alerting to monitor Cassandra metrics and stay on top of critical issues.
- Ability to set alerts tailored to your Cassandra server’s thresholds to prevent slowdowns and bottlenecks.
- Seamless integration with Microsoft Windows-based services such as Active Directory or IIS.
Pricing
You can choose either a periodic subscription or a perpetual licensing option with a price starting at $1,275 alongside a 30-day free trial.
10. Zabbix
Zabbix is an open-source, enterprise-level monitoring solution designed to monitor distributed network components and servers, including Cassandra.
It supports both polling and trapping-based monitoring, offering email-based alerting that responds to any event within the monitored system. Zabbix provides a web portal for viewing metrics and configuring monitoring behavior.
Features
- Template-based host management and auto-discovery.
- Multi-lingual, multi-tenant and flexible user interface with dashboarding capabilities and geolocation support.
- Ideal for large organizations with dispersed data centers housing multiple Cassandra clusters.
- Support for customizable notifications with built-in backing for email, SMS, Slack, etc
Pricing
It is free, but you can subscribe for support, consultancy, and training to expand your knowledge of using the solution.
Monitor Cassandra Performance with Middleware
Middleware enables metric segmentation, where you closely observe both node-specific and cluster-wide metrics.
With its smart alerting system armed with predictive algorithms, Middleware analyzes data and suggests potential remedies. Additionally, its causation-to-correlation feature identifies issues and provides a thorough root-cause analysis.
It takes a few minutes to integrate Cassandra with Middleware agent, and accessing your Cassandra data on Middleware is easy. Simply navigate to the Dashboard Builder and select the Cassandra – Metrics Dashboard where you visualize all top Cassandra metrics needed to be observed.