Learn how to resolve common Kubernetes issues faster using practical troubleshooting techniques and real-world debugging examples.

Regardless of its popularity, Kubernetes can fumble even the most seasoned DevOps engineers. While it excels at handling containerised applications at scale, it presents unique Kubernetes troubleshooting challenges. 

In this post, we’ll explore the top 10 Kubernetes troubleshooting techniques that every DevOps engineer should master. These K8s troubleshooting tips come from real-world scenarios, showing how to solve common and critical Kubernetes issues quickly and reliably.

Table of Contents

1. Fixing CrashLoopBackOff Errors in Pods

One of the most common and frustrating issues in Kubernetes is when a pod repeatedly crashes during restarts; a situation known as the CrashLoopBackOff error. This occurs when a container fails to start correctly, and Kubernetes continually attempts to restart it, resulting in a loop of failures.

Step 1: List All Pods

The first step in debugging this error is to get a high-level overview of all pods running in your namespace. You can do this with the following command:


kubectl get pods

This will show the status, restart count, and age of each pod. Pods with a status that CrashLoopBackOff clearly indicates an issue requiring immediate attention.

Step 2: Describe the Affected Pod

Once you’ve identified the problematic pod, use the describe command to inspect its internal details:


kubectl describe pod 

This provides configuration information, recent events, and messages that might indicate the reason for the crash. Pay special attention to the Events section, which can help pinpoint issues such as failed image pulls, missing configuration, or permission errors.

Step 3: Review Container Logs

Logs are essential for understanding what went wrong inside a container. Use the following command to access a pod’s logs:


kubectl logs  --previous

If the pod fails before producing logs, you might find the logs are empty. In such cases, use the --previous Flag to inspect the logs from the last failed container instance. This often reveals the root cause, especially if the container exited before any meaningful activity.

If you’re trying to debug pods in real-time, don’t miss our detailed guide on tailing logs with kubectl, which walks you through using --tail, -f, and more advanced flags for live troubleshooting.

Example Scenario:

Let’s walk through a practical example:

  1. Run kubectl get pods and observe the output:

kubectl get pods
my-webapp-pod   0/1   CrashLoopBackOff   5 (2m ago)

The pod my-webapp-pod is crashing repeatedly.

2. Try describing the pod:


kubectl describe pod my-webapp-pod

If this yields no helpful insight, the issue might be within the container logs.

3. Check the logs of the previous instance:


kubectl logs my-webapp-pod --previous:
Error: DATABASE_URL environment variable is not set
Process exiting with code 1

The logs show that the DATABASE_URL Environment variable was missing. This explains why the container failed. The application inside expected a configuration that wasn’t provided.

CrashLoopBackOff errors often result from misconfigurations such as missing environment variables, incorrect commands, or failed dependencies. By systematically inspecting pods, describing their events, and reviewing logs (including those from previous container instances), you can efficiently identify and resolve the underlying cause.

2. Kubernetes Troubleshooting Deployment Failures: ImagePullBackOff

When Kubernetes cannot pull a container image due to authentication issues or an incorrect image name, it triggers an ImagePullBackOff error.

Step 1: Identify Problematic Deployments

Start by checking the status of your deployments:


kubectl get deployments

This command displays all deployments and their replica counts. Pay close attention to the READY column. For example, a “0/3” status means that none of the pods are starting successfully, suggesting an issue at the pod level rather than with the application itself.

For deeper insight, run the following:


kubectl describe deployment 

This provides detailed deployment data, including the pod template, conditions, and recent events. You may see messages like “ReplicaSet failed to create pods,” which can indicate underlying issues.

Step 2: Monitor Rollout Status and History

To track a deployment rollout in real time:


kubectl rollout status deployment 

This is useful when monitoring deployments during CI/CD pipeline runs.

To view the deployment history:


kubectl rollout history deployment 

Use this to identify which revision introduced the issue and compare it with previous working versions.

Step 3: Investigate the Root Cause of ImagePullBackOff

Let’s list the pods and spot the error:


kubectl get pods
my-app-7d4b8c8f-xyz   0/1   ImagePullBackOff   0   2m

Now, inspect the failing pod:


kubectl describe pod my-app-7d4b8c8f-xyz
Failed to pull image "private-registry.com/my-app:v1.2.3":
Error response from daemon: pull access denied for private-registry.com/my-app

This indicates that Kubernetes cannot access the private container registry due to missing or invalid credentials.

Step 4: Fixing the ImagePullBackOff Using Secrets

To resolve this, create a Kubernetes Secret to store your private registry credentials securely:


kubectl create secret docker-registry my-registry-secret \
  --docker-server=private-registry.com \
  --docker-username=myuser \
  --docker-password=mypassword \
  [email protected]

Now, patch your deployment to reference the secret:


kubectl patch deployment my-app -p '{"spec":{"template":{"spec":{"imagePullSecrets":[{"name":"my-registry-secret"}]}}}}'

Once patched, Kubernetes will re-trigger the deployment using the correct credentials. You can monitor the new rollout using:


kubectl rollout status deployment my-app

3. Kubernetes Troubleshooting: Fixing NotReady Node Errors

One of the most common Kubernetes troubleshooting issues DevOps engineers face is the NotReady status on a node, which blocks pod scheduling and disrupts workloads.
If the kubelet on the node cannot communicate with the server or the node fails health checks, the NotReady The status appears, preventing pods from being scheduled, which often leads to application downtime.

Step1: Checking Node Status

First, check the status of all nodes:


kubectl get nodes -o wide

This command lists all cluster nodes, showing their status, container runtime, IP addresses, and OS details. The -o wide flag gives additional context, helping identify node-level issues such as OS mismatches or node location patterns (e.g., subnet-specific failures).

Step 2: Inspecting Node Conditions and Issues

Use the following command to get a detailed view of a node’s resource capacity and health:


kubectl describe node  | grep -A 5 "Capacity\|Allocatable"

This will show whether the node’s allocatable resources are within limits or are being over-utilized.

For example, let’s say you have a node showing a NotReady status:


kubectl get nodesworker-node-1   NotReady      5d   v1.28.0

Step 3: To investigate further


kubectl describe node worker-node-1
Conditions:
Type             Status  LastHeartbeatTime   LastTransitionTime   Reason                 Message
DiskPressure     True    Mon, 01 Jan 2024    Mon, 01 Jan 2024     KubeletHasDiskPressure kubelet has disk pressure

Fixing Disk Pressure Issues

In this case, the node is reporting disk pressure typically because the partitions or logs have consumed excessive space.

To resolve this, clear out system logs using:


sudo journalctl --vacuum-time=3d

This will remove logs older than 3 days and free up disk space, helping the kubelet return the node to a Ready state.

4. Diagnosing Service and Networking Problems: The Pending error

When services or pods are stuck in a Pending state, Kubernetes troubleshooting becomes essential. This often indicates a selector mismatch, networking misconfiguration, or issues with DNS resolution.

Service connectivity problems are among the most frustrating issues in Kubernetes. Start by listing all services to verify their configuration:


kubectl get services --all-namespaces

This command returns service types and network details, including IP addresses and ports.

Step1: Verifying services 

To identify if a service lacks matching pods, use the following command to list endpoints:


kubectl get endpoints

If an endpoint is empty, it means no pods match the service’s selector, which is commonly the root cause of connection failures.

You can further inspect the service configuration to verify the labels used to locate pods:


kubectl describe service 

Compare the selector labels in the service with the actual labels on your pods to ensure they align.

Step 2: Inspecting DNS Issues In The Cluster

If there’s a communication failure between microservices that causes the Pending state, DNS resolution might be the culprit. You can run the following commands from within a pod:


nslookup my-service
nslookup my-service.d

If the DNS name does not resolve to an IP address, there’s likely an issue with the cluster’s DNS configuration.

Testing HTTP Connectivity

To ensure that the service is responding correctly, test its endpoint with:


wget -qO- my-service:80/health

If the request fails, it could indicate a problem with the service configuration, network policies, or incorrect pod selectors.

5. Kubernetes Troubleshooting High Resource Usage: Solving OOMKilled Errors

Monitoring resources is a crucial part of Kubernetes troubleshooting, enabling the maintenance of healthy clusters and ensuring optimal application performance. When a container exceeds its allocated memory limit, Kubernetes forcefully terminates it, resulting in the infamous OOMKilled error. This can cause pod evictions, application downtime, or severe performance degradation.

Step 1: Checking Resource Usage 

To identify memory-intensive nodes and pods:


kubectl top nodes

This command provides real-time resource usage across nodes. If a node is using over 80% of its memory, it may be at risk of triggering OOMKilled errors.

You can also view pod-level resource usage and sort by CPU or memory:


kubectl top pods --all-namespaces --sort-by=cpu
kubectl top pods --all-namespaces --sort-by=memory

These commands help you pinpoint the most resource-hungry pods in the cluster.

Track OOMKilled and other errors in real-time with Middleware’s K8 agent.

Step 2: Investigating Resource Quotas, Limits, And Autoscaling

Understanding and monitoring resource limits is essential. To check quotas across all namespaces:


kubectl describe quota --all-namespaces

A Memory leaks often appears as a gradual increase in memory usage over time. You can monitor a pod’s live memory consumption every 5 seconds using:


watch -n 5 'kubectl top pod memory-hungry-app'

This helps you spot memory leaks or gradually increasing memory usage.

Avoid memory issues before they crash your pod. See how detects them early via Kuberenetes Monitoring.

Step 3: Checking Pod Resource Requests and Limits

If pods lack memory limits, they can consume excessive resources, potentially affecting other workloads. To inspect resource requests and limits:


kubectl describe pod memory-hungry-app | grep -A 10 "Requests\|Limits"

This is a useful Kubernetes troubleshooting trick to identify which pods lack proper memory restrictions.

To prevent OOMKilled errors and automatically balance resource usage, you can set up Horizontal Pod Autoscaling (HPA).

Pro Tip: Use autoscaling to manage resource spikes. For example, to autoscale a deployment based on 70% CPU usage:


kubectl autoscale deployment my-app --cpu-percent=70 --min=2 --max=10

You can verify if autoscaling is active and functioning correctly:


kubectl get hpa
kubectl describe hpa my-app

Proactively managing resource limits and enabling autoscaling helps prevent OOMKilled errors and ensures smoother application performance.

6. Kubernetes Troubleshooting Storage: Resolving PVC Pending Errors

The PersistentVolumeClaim (PVC) Pending status is a common storage issue in Kubernetes, preventing applications from accessing persistent data. This typically results from misconfigured storage classes, missing volume provisioners, or insufficient available storage in the cluster.

Step 1: Inspecting PV and PVC Status

Start by listing all Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) across all namespaces. This command provides an overview of their status, access modes, capacity, and whether they are bound:


kubectl get pv,pvc --all-namespaces

Step 2: Troubleshooting Mounting Issues

To further investigate an unbound PVC stuck in the Pending state, use the following command:


kubectl describe pvc 

Check the Events section at the bottom of the output. It often reveals the root cause, such as:

  • No matching PersistentVolume available
  • Storage class mismatch
  • Insufficient capacity
  • Missing provisioned

Step 3: Verifying Storage Classes

Incorrect or non-existent storage classes are a common culprit. List all available storage classes:


kubectl get storageclass

Then, describe a specific one to inspect details like the provisioner and parameters:


kubectl describe storageclass 

Ensure that your PVC references a valid, correctly configured storage class. If the specified provisioner does not exist or is misspelled, the volume will fail to provision.

Step 4: Common Mistakes and Resolution

Suppose you’ve defined a PVC that references a storage class named fast-ssd, but it’s failing to provision. Run:


kubectl describe pvc my-data-pvc

You might see an error like:


Warning  ProvisioningFailed  3m    persistentvolume-controller 
storageclass.storage.k8s.io "fast-ssd" not found

Now, list all available storage classes to confirm:


kubectl get storageclass

If fast-ssd is missing, but gp2 or standard exists, update your PVC to use a valid class.

7. Using Event and Audit Logs: Deep System Analysis 

Kubernetes provides two powerful tools for debugging: events and audit logs. These help you track what happened, when it happened, and why, giving you a timeline of system activities for root cause analysis.

Step 1: Understanding Kubernetes Events

Events in Kubernetes record what’s happening inside the cluster. You can list events across all namespaces and sort them by their creation time to see the most recent activity at the bottom. This helps correlate issues with recent system behavior.

For a more comprehensive understanding of how Kubernetes handles logs across nodes and clusters, check out our complete guide to Kubernetes logging.

For View all events sorted by time:


kubectl get events --all-namespaces --sort-by='.metadata.creationTimestamp'

To filter events that occurred after a specific time:


kubectl get events --field-selector='lastTimestamp>2023-10-01T10:00:00Z'

To view only warning-type events (which often indicate potential problems):


kubectl get events --field-selector type=Warning

You can also monitor events in real-time using the --watch flag. This is helpful when you’re actively troubleshooting and want to immediately observe what happens after deploying or modifying resources:


kubectl get events --watch

If you’re investigating a specific pod, deployment, or service, you can filter events to focus only on that object. For example:


kubectl get events --field-selector involvedObject.name=my-pod

In case you’re dealing with pods not getting scheduled, you can filter events with reason set to “FailedScheduling”. This will show why Kubernetes couldn’t place the pod on a node, such as due to insufficient resources or affinity conflicts:


kubectl get events --field-selector reason=FailedScheduling

Step 2: Using Audit Logs for In-Depth Troubleshooting

While events help you understand what’s happening, audit logs let you see who did what at the API level essential for security investigations or when tracking administrative actions.

Audit logs are not enabled by default. To enable them, you must configure an audit policy. Here’s a sample audit policy configuration that captures detailed logs for core resources like pods, services, deployments, etc.:


apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: RequestResponse
  resources:
  - group: ""
    resources: ["pods", "services"]
  - group: "apps"
    resources: ["deployments", "replicasets"]
- level: Request
  resources:
  - group: ""
    resources: ["configmaps", "secrets"]

Once configured, audit logs can help you track down issues like:


{
  "kind": "Event",
  "apiVersion": "audit.k8s.io/v1",
  "level": "RequestResponse",
  "auditID": "4d2c8b7a-f3e1-4b2a-9c8d-1e3f5a7b9c2d",
  "stage": "ResponseComplete",
  "requestURI": "/api/v1/namespaces/production/pods/web-app-7d4b8c9f-xyz",
  "verb": "delete",
  "user": {
    "username": "[email protected]",
    "groups": ["system:authenticated"]
  },
  "sourceIPs": ["192.168.1.100"],
  "userAgent": "kubectl/v1.28.0",
  "objectRef": {
    "resource": "pods",
    "namespace": "production",
    "name": "web-app-7d4b8c9f-xyz"
  },
  "responseStatus": {
    "code": 200
  },
  "requestReceivedTimestamp": "2024-01-15T10:30:00.000Z",
  "stageTimestamp": "2024-01-15T10:30:00.123Z"
}

Once audit logging is enabled, logs will show important details such as:

  • Which user made the API request
  • From which IP address
  • The HTTP verb used (e.g., GET, POST, DELETE)
  • The resource affected (e.g., pod, deployment)
  • The timestamp of the action
  • Whether the request succeeded

Example: An audit log might show that a pod was deleted at a specific time, by a specific admin user, from a certain IP address. This level of transparency is crucial when diagnosing problems caused by accidental or unauthorized changes.

8. Using Kubernetes Dashboard and Visual Tools

While command-line tools like kubectl offer powerful ways to inspect your Kubernetes cluster, visual tools simplify cluster management, especially when identifying patterns across metrics, logs, and events.

Step 1: Kubernetes Dashboard Overview

The Kubernetes Dashboard is a web-based user interface that lets you manage cluster resources visually. It provides detailed insights into deployments, resource usage, logs, and events, making it easier to diagnose issues without needing to run multiple CLI commands.

By default, the Dashboard is not installed in production environments due to security concerns. However, it can be manually deployed as follows:

  1. Deploy the Dashboard: Run the following command to apply the recommended configuration:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml

2. Create a service account for access.


kubectl create serviceaccount dashboard-admin -n kubernetes-dashboard
kubectl create clusterrolebinding dashboard-admin --clusterrole=cluster-admin --serviceaccount=kubernetes-dashboard:dashboard-admin

3. Generate Access Token:


kubectl create token dashboard-admin -n kubernetes-dashboard

Once deployed, the Dashboard allows you to:

  • Monitor CPU and memory usage over time
  • Visualize event timelines
  • Explore relationships between Kubernetes resources
  • Stream application logs directly in your browser

Example Use Case:

Suppose your application experiences intermittent failures. The Dashboard may show that CPU usage spikes align with these failures, and the events log shows that pods are being OOMKilled. This kind of pattern is easier to identify visually than by reading raw CLI logs.

Step 2: Alternative Visualisation Tools like Middleware

While the Kubernetes Dashboard is helpful, it has limitations in terms of full observability. Tools like Middleware enhance visibility by combining metrics, logs, traces, and alerting into a single view. They support custom dashboards and real-time insights into your entire Kubernetes environment.

To install the Middleware agent as a DaemonSet:


apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: middleware-agent
  namespace: middleware
spec:
  selector:
    matchLabels:
      app: middleware-agent
  template:
    metadata:
      labels:
        app: middleware-agent
    spec:
      containers:
      - name: middleware-agent
        image: middleware/agent:latest
        env:
        - name: MW_API_KEY
          valueFrom:
            secretKeyRef:
              name: middleware-secret
              key: api-key
        - name: MW_TARGET
          value: "https://api.middleware.io"
        volumeMounts:
        - name: docker-sock
          mountPath: /var/run/docker.sock
        - name: proc
          mountPath: /host/proc
          readOnly: true
        - name: sys
          mountPath: /host/sys
          readOnly: true
      volumes:
      - name: docker-sock
        hostPath:
          path: /var/run/docker.sock
      - name: proc
        hostPath:
          path: /proc
      - name: sys
        hostPath:
          path: /sys

Once integrated, you can visualise your pods, collect metrics from various sources and also monitor the status of your deployments. 

9. Implementing Health Checks and Probes

Health checks in Kubernetes function similarly to routine medical checkups, helping to detect issues early and ensuring everything is functioning as expected.

Kubernetes uses probes to monitor the health and availability of your application containers. These probes enable the cluster to detect issues and take automated actions, such as restarting containers or stopping traffic routing, when necessary.

Understanding Readiness and Liveness Probes

Kubernetes provides three types of probes, each serving a specific role in maintaining container health:

  1. Liveness Probe: Checks if the container is still running. If it fails repeatedly, Kubernetes restarts the container.
  2. Readiness Probe: Checks if the container is ready to accept traffic. If this fails, the container is temporarily removed from the service endpoints.
  3. Startup Probe: Provides containers with additional time to complete their startup logic before other probes begin. This is useful for applications with longer boot times.

Example: Configuring All Three Probes

Below is a configuration example that combines all three types of probes in a single deployment:


apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-application
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-application
  template:
    metadata:
      labels:
        app: web-application
    spec:
      containers:
      - name: web-app
        image: my-app:v1.2.3
        ports:
        - containerPort: 8080
       
        # Startup probe - gives the app time to initialize
        startupProbe:
          httpGet:
            path: /health/startup
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 30  # 30 * 5 = 150 seconds to start
          successThreshold: 1
       
        # Liveness probe - restarts container if unhealthy
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
          successThreshold: 1
       
        # Readiness probe - removes from service if not ready
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3
          successThreshold: 1
       
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"

How These Probes Work Together

  • Startup Probe: This is checked first. It runs every 5 seconds, allowing up to 150 seconds for the application to complete startup tasks such as initializing databases or loading configurations. During this time, other probes are paused.
  • Liveness Probe: Once the startup probe succeeds, the liveness probe takes over. It ensures the container remains healthy. If the check fails three times in a row, Kubernetes automatically restarts the container.
  • Readiness Probe: This ensures the container is prepared to handle incoming traffic. If the check fails (e.g., due to a temporary database outage), Kubernetes temporarily removes the pod from the load balancer without restarting it.

10. Advanced Debugging Techniques

While standard Kubernetes debugging methods handle many day-to-day issues, there are times when more advanced techniques are needed, especially for diagnosing complex performance bottlenecks, unexpected application behavior, or deep network-level problems that basic tools can’t resolve.

Step 1: Using Ephemeral Containers for Live Debugging

Ephemeral containers are a powerful way to troubleshoot live applications without restarting pods or altering their state. They allow you to temporarily inject a debugging container into a running pod, ideal for production debugging where uptime is critical.

For example, to initiate a basic debugging container within a live pod:


kubectl debug  -it --image=busybox --target=

To include specific debugging tools (like bash, curl, dig), use an image like Ubuntu:


kubectl debug database-pod -it --image=ubuntu --target=postgres -- bash

Practical Example: Network Issue Investigation
Imagine your web application is facing intermittent connectivity issues. You can attach a debugging container with networking tools like netshoot:


kubectl debug web-app-7d4b8c9f-xyz -it --image=nicolaka/netshoot --target=web-app

Inside the debugging container, we can now check network connectivity:


ping database-service
nslookup database-service

Inside the debugging container, you can perform several diagnostics:

Check service connectivity:


ping database-service
nslookup database-service

Test open ports:


telnet database-service 5432

Inspect networking interfaces:


ip addr show
ss -tuln

Validate DNS resolution:


dig database-service.default.svc.cluster.local

Monitor network traffic:


tcpdump -i any port 5432

Inspect running processes:


ps aux

And examine the file system: 


ls -la /app/
cat /app/config.yaml

This kind of live environment debugging allows for pinpointing issues that might only occur under real production conditions.

Step 2: Leveraging kubectl debug for Broader Scenarios

The kubectl debug command also supports more advanced operations beyond ephemeral containers:

Create a full debug copy of a pod:


kubectl debug web-app-7d4b8c9f-xyz --copy-to=web-app-debug --image=ubuntu --set-image=web-app=ubuntu -- sleep 1d

You can also create a new pod with the same configuration but a different image:


kubectl exec -it web-app-debug -- bash

Debug at the node level: You can launch a privileged pod on a node to investigate node-level issues:


kubectl debug node/worker-node-1 -it --image=ubuntu

Inside the privileged container, you can access the host’s filesystem and services:


chroot /host bash
systemctl status kubelet
journalctl -u kubelet -f

Add profiling containers for performance analysis:
If you’re looking into CPU profiling or memory leaks, a container with Go or another profiling tool can help:


kubectl debug web-app-7d4b8c9f-xyz -it --image=golang:1.21 --target=web-app

Why These Techniques Matter

Advanced debugging isn’t just about having extra commands; it’s about having flexibility to access low-level details without affecting production workloads. With ephemeral containers, node-level access, and full pod duplication, you can troubleshoot virtually any problem live and in context, minimizing guesswork and downtime.

Using Monitoring and Tracing Tools like Middleware

While built-in tools like kubectl are essential for initial debugging, they can fall short when dealing with complex, distributed systems. This is where third-party observability platforms come into play. Middleware, for instance, offers a comprehensive observability solution that provides deeper visibility into your Kubernetes clusters.

Middleware enables you to track critical Kubernetes metrics, including node health, pod performance, and resource utilization. It collects data from sources such as Kube-state-metrics and the Metrics Server to provide you with real-time, actionable insights.

The Middleware agent offers an intuitive dashboard that visualizes metrics across your cluster. It enables you to:

  • Monitor pod activity and identify issues using unique pod IDs.
  • Analyze node-level CPU and memory usage.
  • Track the health and status of deployments over time.
  • Detect performance bottlenecks and configuration mismatches quickly.

Ready to gain complete visibility into your Kubernetes clusters?

Try Middleware for free and start monitoring your pods, nodes, and workloads in real-time — with zero configuration hassle.

By integrating Middleware into your debugging workflow, you gain a centralized view of your infrastructure, which significantly improves your ability to detect, investigate, and resolve issues.

visualise your pods in Middleware for K8 troubleshooting

You can inspect inside your nodes to view their CPU usage:

inspect inside your nodes to view their CPU usage

You can also monitor the status of your deployments:

monitor the status of your deployments

Conclusion

Effectively troubleshooting Kubernetes relies on knowing when and how to apply the right debugging approach. Tools like kubectl, events, and audit logs are crucial for day-to-day debugging. However, combining these with a dedicated Kubernetes observability platform, such as Middleware, enhances visibility, reduces MTTR (mean time to resolution), and ensures smoother operations across your Kubernetes environment.