What is Kubernetes OOMKilled? Learn why exit code 137 fires, how to confirm it with kubectl, fix memory limits and leaks, configure JVM correctly, and prevent it with VPA, LimitRanges, and Middleware monitoring.

Kubernetes OOMKilled is one of the most disruptive errors in containerized production environments. The Linux kernel kills your container mid-flight with no warning, drops all in-flight requests, and forces Kubernetes to restart the workload from scratch. If the root cause is not addressed, the cycle repeats until the pod enters CrashLoopBackOff and your service goes down.

This guide covers everything you need to detect, diagnose, fix, and prevent OOMKilled in Kubernetes: what triggers it, how to read the termination signals, the difference between container-level and node-level OOM, how QoS classes influence kill priority, and the right strategy for right-sizing memory limits without guessing.

TL;DR

  • OOMKilled means the Linux kernel killed your container for exceeding its configured memory limit.
  • It always produces exit code 137 and is confirmed by Reason: OOMKilled in kubectl describe pod.
  • Container-level OOM kills one container. Node-level OOM kills multiple pods across the node.
  • Root causes: limit set too low, memory leak, traffic spike, or JVM heap misconfiguration.
  • Quick fix: set the memory limit at the 99th percentile of observed peak usage plus a 20-30% buffer.
  • Repeated OOMKilled events without a root-cause fix lead directly to CrashLoopBackOff.
  • Middleware detects memory trends before the kill happens, so you fix it before production is impacted.

What is OOMKilled in Kubernetes?

OOMKilled (Out of Memory Killed) is a Kubernetes container termination event that occurs when a container exceeds its configured memory limit. As memory usage climbs past that limit, the Linux kernel’s OOM Killer identifies the container’s process as the highest-priority kill target and sends a SIGKILL signal, terminating it immediately without a graceful shutdown. Kubernetes records the termination with exit code 137 and restarts the container according to the Pod’s restart policy.

A normal container crash occurs when an application process exits on its own, due to an unhandled exception, a failed health check, or a misconfigured startup command. An OOMKilled termination is not initiated by the application. It is initiated by the Linux kernel, which forcibly kills the container process the moment it exceeds its memory limit, regardless of what the application is doing at that point. A normal crash produces an exit code that reflects the application’s own error. An OOMKilled event always produces exit code 137, regardless of what the application was doing at the time of termination.

What is Exit Code 137

Exit code 137 specifically means the process was killed using the SIGKILL signal. In Linux, this value comes from adding 128 to signal 9, the signal number for SIGKILL. While exit code 137 is commonly associated with OOMKilled events, it does not always mean the process ran out of memory. Any process terminated forcefully with SIGKILL returns the same exit code. OOMKilled is confirmed only when Kubernetes also reports Reason: OOMKilled in the container or pod status.

Important: Exit code 137 and OOMKilled are not synonymous. A liveness probe failure, a manual docker kill, or a node shutdown can also produce exit code 137. Always verify Reason: OOMKilled in lastState.terminated before assuming memory is the cause.

What the OOMKilled event looks like in your cluster

kubectl describe pod <pod-name> exposes three fields that confirm an OOMKilled event: Reason, Exit Code, and Restart Count.

yaml
# Last recorded state of the terminated container
Last State:   Terminated
  Reason:     OOMKilled       # Confirms memory as the termination cause
  Exit Code:  137             # Signal-based code: 128 + SIGKILL (9)
  Started:    Mon, 12 May 2025 10:00:00 +0000
  Finished:   Mon, 12 May 2025 10:04:32 +0000
Restart Count: 3              # Number of times this container has restarted

A rising restart count combined with Reason: OOMKilled confirms the container is repeatedly hitting its memory limit. A single OOMKilled event may be a traffic spike. Three or more restarts in a short window indicates either a memory leak or a consistently undersized limit. If the memory pressure is not resolved, repeated OOMKilled events will eventually cause the container to enter CrashLoopBackOff.

Three fields confirm an OOMKilled event:

FieldWhat It Tells You
Reason: OOMKilledMemory is the confirmed cause
Exit Code: 137Process received SIGKILL
Restart CountHow many cycles have already occurred

Interpreting restart count:

  • 1 restart — Possibly a one-off traffic spike. Monitor closely.
  • 3+ restarts in a short window — Either a memory leak or a consistently undersized limit. Root-cause investigation required immediately.
  • Escalating restarts — If left unresolved, this path leads directly to CrashLoopBackOff.

Container-level OOM vs. node-level OOM

Misidentifying which type of OOM event you are dealing with sends you to the wrong diagnosis path and wastes time.

  • Container-level OOM occurs when a single container exceeds its configured memory limit. The kernel kills only that container while every other pod on the node continues running. The fix lives in the container spec: either raise the memory limit or resolve the memory leak in the application code.
  • Node-level OOM occurs when total memory consumption across all Pods exhausts the node’s physical memory. The kernel begins terminating processes across multiple containers to reclaim memory, meaning Pods that had nothing to do with the spike can go down. The fix is not in any single container spec. It requires reducing memory footprint across the node or adding capacity to the cluster.
Container-Level OOMNode-Level OOM
TriggerContainer exceeds its limits.memoryNode physical memory exhausted
What gets killedThe specific container onlyMultiple pods across the node
Other pods affected?NoYes — unrelated pods can die
Fix lives inContainer spec (limits.memory or code)Node capacity or cluster topology
Diagnosis commandkubectl describe podkubectl describe node

Important: Increasing a single container’s memory limit will not fix a node-level OOM event. If multiple unrelated pods are going down simultaneously, check node memory pressure with kubectl describe node before touching any container spec.

For a complete approach to tracking node memory, CPU, and control plane health, see our guide on Kubernetes Infrastructure Monitoring.

Memory management in Kubernetes

Kubernetes controls memory through two values, requests and limits and enforces them via Linux cgroups. Every OOMKilled event is a violation of the limit boundary. Understanding how these three pieces interact tells you exactly why your container was killed and which fix applies.

Requests vs Limits

Memory request: The amount of memory Kubernetes guarantees a container will have available on its scheduled node. The scheduler uses this value to find a node with enough headroom.

Memory limit: The hard ceiling on memory a container can use. Breach it, and the kernel kills the container.

yaml
# Memory resource configuration for a single container
resources:
  requests:
    memory: "256Mi"    # Memory guaranteed at scheduling time
  limits:
    memory: "512Mi"    # Hard ceiling — breach triggers OOMKilled

Two dangerous misconfigurations:

Limit set far above request — Creates false safety. If every container on the node tries to use its maximum simultaneously, there may not be enough physical memory to honor all limits.

Request without a limit — The container can consume unbounded memory, eventually exhausting the node and causing a node-level OOM.

Note: A container with no memory limit can consume all available memory on a node, triggering a node-level OOM event that brings down every Pod on that node.

How cgroups enforce memory boundaries

cgroups (control groups) are a Linux kernel feature that enforces resource boundaries on processes. Kubernetes uses cgroups to implement memory limits at the container level. When a container is created, Kubernetes configures a cgroup with a memory ceiling matching the value in the container spec. The kernel monitors usage against that boundary continuously. The instant usage crosses it, the OOM Killer fires.

How QoS classes affect OOMKilled behavior

Kubernetes assigns every pod a Quality of Service (QoS) class based on how its memory requests and limits are configured. Under node memory pressure, QoS class determines which pods get killed first.

QoS ClassAssigned WhenKill Priority
GuaranteedEvery container: requests = limitsLast to be killed
BurstableAt least one container: requests < limitsKilled after BestEffort
BestEffortNo container has requests or limits setFirst to be killed

Rule: A container with no memory limit is BestEffort and the first target the OOM Killer selects under node pressure. Always define both requests and limits on every container, including sidecars and init containers.

Common causes of OOMKilled in Kubernetes

Kubernetes OOMKilled is caused by one of six things: the memory limit is set too low for real workload, the application has a memory leak, a traffic spike pushed usage above the limit, a JVM allocated heap based on node memory instead of container limit, a sidecar consumed unexpected memory, or an init container was never given a limit. Identifying which one applies determines whether you raise the limit, fix the code, or reconfigure the runtime.

1. Memory limit set too low

The most common cause. The limit defined in the container spec does not reflect actual memory usage under real production load. A limit that works during development or low-traffic testing will breach the moment the application processes realistic concurrent load.

Fix: Observe actual memory usage under realistic load using kubectl top pod or a monitoring tool like Middleware. Set the limit at roughly 20–30% above the observed 99th-percentile peak to accommodate normal variance.

2. Memory leaks in application code

A memory leak means the application allocates memory during execution and never releases it. In a container, a leak does not cause an immediate crash. Memory climbs gradually until it crosses the limit, the kernel fires, the container restarts, the leak begins again. The cycle repeats until CrashLoopBackOff.

Memory leak fingerprint in monitoring:

  • RSS (Resident Set Size) climbs continuously over time
  • Memory resets only on container restart
  • Usage never returns to baseline after traffic returns to normal

Critical: Increasing the memory limit on a container with a memory leak is not a fix. It only delays the next OOMKilled event. Profile the application first.

Profiling tools by runtime:

RuntimeTool
JVMJava Flight Recorder, async-profiler
GoBuilt-in pprof package
Node.js--inspect flag + Chrome DevTools heap snapshots
Pythonmemory-profiler, tracemalloc

3. Sudden traffic spikes and burst memory usage

An application that handles memory efficiently under normal load can still be OOMKilled during a traffic spike. A sudden surge in concurrent requests forces the application to allocate more memory than the limit accounts for. Limits must be set to accommodate the highest realistic memory usage, not the average.

This is why load testing before production deployment is non-negotiable. A limit that holds under development traffic may not hold under peak production concurrency.

4. JVM and garbage-collected runtimes

JVM-based applications carry a specific OOMKilled risk in Kubernetes. By default, the JVM sizes its heap based on the total memory visible to it. Inside a container, that can mean the JVM sees the node’s total memory rather than the container’s memory limit, and allocates a heap that exceeds the limit before the application processes a single request.

The fix is explicit heap configuration:

env:
  - name: JAVA_OPTS
    value: "-Xms256m -Xmx768m"
resources:
  requests:
    memory: "512Mi"
  limits:
    memory: "1Gi"    # ~30% headroom above -Xmx for off-heap

Rule: Never set the container memory limit equal to -Xmx. Off-heap allocations (thread stacks, metaspace, native libraries) consume memory beyond the heap. A 25–30% buffer above -Xmx is the standard starting point.

JVM container awareness: JVM 8u191+ and JVM 11+ are container-aware and can read the cgroup memory limit directly via -XX:+UseContainerSupport (enabled by default in JDK 11+). This allows the JVM to auto-configure heap based on the container limit rather than the node’s total memory but explicit -Xmx is still recommended for predictable behavior.

5. Sidecar containers consuming shared pod memory

Sidecars log forwarders, service mesh proxies, monitoring agents run alongside the main container and share the same pod. If sidecar memory limits are not set or are set too low, the sidecar can consume enough memory to contribute to a node-level OOM that kills unrelated pods.

Every container in a pod needs explicit memory requests and limits. Sidecars are not exempt.

6. Misconfigured init containers

Init containers run before the main application container starts and are routinely overlooked in memory planning. An init container performing a database schema migration, large data import, or cryptographic key generation can spike to several hundred MB for a brief period.

  • No limit set → it consumes whatever is available on the node
  • Limit set too low → it is OOMKilled before the pod ever starts

Apply the same memory limit discipline to init containers as to every other container.

How to detect OOMKilled in Kubernetes

To confirm a container was OOMKilled, run kubectl describe pod <pod-name> and look for Reason: OOMKilled in the Last State block. That single field is the definitive confirmation that it persists even after the container has restarted and is running again. Here is the full five-step process for a complete diagnosis

Step 1: Check Pod last state

The fastest confirmation is the lastState.terminated field, which persists even after the container has restarted and is running again.

kubectl describe pod <pod-name>

Look for this block:

Last State:   Terminated
  Reason:     OOMKilled
  Exit Code:  137
  Restart Count: 3

All four fields together constitute definitive confirmation. A pod that was OOMKilled will show the following in its status:

  • Last State: Terminated confirms the container did not exit cleanly
  • Reason: OOMKilled confirms memory as the cause of termination
  • Exit Code: 137 confirms the kernel sent SIGKILL
  • Restart Count shows how many times the container has been killed and restarted

A restart count above zero warrants investigation. A restart count of three or more in a short window indicates the root cause has not been resolved.

Step 2: Pull logs from the terminated container

kubectl logs <pod-name> --previous

OOMKilled containers are terminated by SIGKILL with no warning. Logs end abruptly with no error message, no stack trace, no shutdown message. An abrupt log ending is consistent with OOMKilled and distinguishes it from application crashes that do produce a final log line.

Step 3: Check cluster events

Cluster events capture pod terminations, restarts, and evictions at the namespace level and give a timeline of what happened across all pods simultaneously.

kubectl get events --sort-by='.lastTimestamp'kubectl 

Important: Events are retained for only 1 hour by default. An OOMKilled event that occurred more than an hour ago will not appear here. For historical event analysis, you need a persistent observability platform like Middleware.

Step 4: Check real-time memory usage

kubectl top requires Metrics Server to be installed. If it is not present, the command returns an error and you need to query your observability platform directly.

# All pods in namespace
kubectl top pod --all-namespaces

# Per-container breakdown for a specific pod
kubectl top pod <pod-name> --containers

For a complete setup, auto-discovery of pods and nodes, real-time metrics, and centralized logs, see Monitoring Kubernetes Applications with Middleware.

Step 5: Distinguish container-level from node-level OOM

If multiple unrelated pods are going down at the same time:

kubectl describe node <node-name>

Look for MemoryPressure: True in the Conditions section. If it is active, you have a node-level OOM problem that cannot be fixed by adjusting any single container’s limit.

Memory Metrics: What to Watch and Why

Three memory metrics are most relevant for detecting and predicting OOMKilled events. Each measures a different aspect of container memory consumption and each points to a different cause.

MetricWhat It MeasuresOOMKilled Relevance
RSS (Resident Set Size)Physical memory actively used by container processesRising RSS approaching the limit is the clearest predictor of an imminent OOMKilled event
Working Set MemoryRSS + cache that cannot be reclaimed under pressureThis is what Kubernetes compares against the memory limit to decide whether to trigger OOMKilled
Cache MemoryRecently accessed file dataReclaimable under pressure; high cache is normal and does not directly cause OOMKilled

Related: For a deeper look at Kubernetes observability strategy, see our guide on Kubernetes monitoring best practices.

How to fix OOMKilled in kubernetes

The fix for OOMKilled depends entirely on why the container exceeded its limit. If the limit is too low for legitimate usage, raise it to the 99th-percentile peak plus a 20–30% buffer. If the application is leaking memory, raising the limit only delays the next kill you must profile and fix the leak first. The four fixes below cover every root cause.

Fix 1: Right-size memory limits

Increasing the memory limit is the correct fix when the container’s memory usage is legitimate and the limit is simply undersized. It is not the correct fix when the container has a memory leak.

Right-sizing methodology:

  1. Collect working set memory data for a minimum of 7 days to capture weekly traffic patterns.
  2. Identify the 95th or 99th percentile of usage during that period.
  3. Set requests at the median usage value.
  4. Set limits at the 99th-percentile value plus 20–30% buffer.

Setting the limit exactly at the observed peak guarantees another OOMKilled event the next time traffic exceeds that level.

resources:
  requests:
    memory: "256Mi"    # Median observed usage
  limits:
    memory: "640Mi"    # 99th percentile + 25% buffer

Fix 2: Fix the memory leak (Not just the limit)

Confirm the leak first. Use the fingerprint:

  • RSS climbs continuously and resets only on container restart → leak confirmed
  • Memory spikes with traffic and returns to baseline afterward → not a leak; resize the limit

Once confirmed, profile with the appropriate tool (see table above), identify the offending allocation path, and fix it in the application code or dependency. A higher limit on a leaking container only postpones the next kill.

Fix 3: Configure JVM Heap limits correctly

env:
  - name: JAVA_OPTS
    value: "-Xms256m -Xmx768m"
resources:
  requests:
    memory: "512Mi"
  limits:
    memory: "1Gi"    # 30% headroom above -Xmx

For JDK 11+, -XX:+UseContainerSupport is enabled by default. You can also use -XX:MaxRAMPercentage=75.0 instead of an explicit -Xmx value — this sets the heap to 75% of the container limit dynamically, preserving the 25% off-heap buffer automatically as the limit changes.

Fix 4: Use VPA to automate right-sizing

The Vertical Pod Autoscaler monitors actual resource usage and automatically adjusts memory requests and limits to match observed behavior removing the need to manually right-size after every workload change.

VPA operates in three modes:

ModeBehaviorWhen to Use
OffGenerates recommendations; applies nothingStart here — validate before committing
InitialApplies recommendations only when a pod is first createdSafe for stable workloads
AutoApplies recommendations continuously; restarts pods as neededProduction use after Off-mode validation

yaml

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"    # Start here — collect recommendations for 2+ weeks

Tip: Run VPA in Off mode for at least two weeks before switching to Auto. This gives you a validated baseline and prevents VPA from restarting pods based on insufficient data.

Prevention of OOM-killer error and best practices

To prevent OOMKilled in Kubernetes: set explicit memory requests and limits on every container (including sidecars and init containers), monitor working set memory trends with alerts at 80% of the limit, load test under realistic traffic before deploying, and use VPA or LimitRanges to enforce safe defaults automatically. The specific practices below cover each of those pillars.

Set memory requests and limits on every container — including sidecars and init containers. A container without a limit can consume unbounded memory and trigger a node-level OOM.

Never set limits far above requests. A large gap means the node may not have enough physical memory to honor all limits simultaneously if multiple containers try to use their maximums at once.

Use namespace-level LimitRanges as a safety net. A LimitRange sets default memory requests and limits for every container in a namespace that does not define its own, preventing unguarded deployments.

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - default:
      memory: 512Mi
    defaultRequest:
      memory: 256Mi
    type: Container

Monitor memory usage trends and alert before the limit is reached — not after. A container whose working set memory climbs steadily week over week will eventually be OOMKilled. Proactive alerting at 80% of the memory limit gives you time to act.

Load test before going to production. A limit that holds under development traffic will breach under realistic production concurrency. Realistic load testing eliminates the most common sizing mistakes before they become incidents.

For JVM workloads, always set -Xmx explicitly and leave 25–30% headroom in the container memory limit for off-heap usage.

Use QoS class Guaranteed for critical workloads by setting requests equal to limits. This makes critical pods the last to be killed during node-level memory pressure.

Set ResourceQuotas at the namespace level to prevent any single team or deployment from exhausting cluster memory capacity.

Middleware detects rising memory trends before the kernel fires. Get real-time OOMKilled alerts, memory usage tracking, and full pod context in one place. Start monitoring Kubernetes

OOMKilled, CrashLoopBackOff, exit code 143, pod eviction, and MemoryPressure are four distinct things that are frequently confused because they can all appear in the same incident. OOMKilled is a termination reason. CrashLoopBackOff is a pod state. Exit code 143 is a graceful shutdown. Pod eviction is the kubelet acting proactively. The table below separates them clearly.

ErrorCauseInitiated ByWarning Before KillExit CodeFix
OOMKilledContainer exceeded memory limitLinux kernelNone137Adjust memory limit or fix leak
CrashLoopBackOffRepeated container failuresKubernetesNoneVaries (often 1 or 137)Investigate last state reason
Exit Code 143Graceful shutdown via SIGTERMKubernetesSIGTERM143Expected; no fix needed
Pod EvictionNode memory pressureKubeletSIGTERMNoneCheck node memory pressure

OOMKilled vs. CrashLoopBackOff

CrashLoopBackOff is a pod state, not a termination reason. It describes a container that keeps crashing repeatedly. OOMKilled is a termination reason that describes why a container crashed. A container that is OOMKilled repeatedly will eventually enter CrashLoopBackOff. The pod status tells you the container keeps failing. The last state tells you why.

OOMKilled vs. Exit Code 143

Exit code 143 is produced when a container receives SIGTERM and exits gracefully. Oomkilled is produced when the kernel sends SIGKILL with no warning. Exit code 143 is expected behavior during rolling updates, scale-down events, and node drains. Exit code 137 always requires investigation.

OOMKilled vs. Pod Eviction

Pod eviction is initiated by the kubelet when a node reports MemoryPressure. OOMKilled is initiated by the Linux kernel when a container breaches its cgroup memory limit. An evicted pod receives SIGTERM before SIGKILL. An OOMKilled container receives no warning. Eviction is Kubernetes acting to prevent a node-level OOM. OOMKilled is what happens when that prevention is not enough.

How Middleware helps you detect and resolve OOMKilled faster

The hardest part of OOMKilled is that by the time the kill fires, the evidence is already gone logs end abruptly, events expire in an hour, and kubectl top shows current state, not history. Middleware solves this by continuously tracking memory trends across every pod, alerting before the limit is breached, and storing the full context restart history, memory curve, cluster events so you have everything needed to diagnose root cause without reconstructing it from scratch.

Middleware is a full-stack observability platform built for teams running containerized applications on Kubernetes. Without any manual instrumentation or custom queries, Middleware tracks RSS, working set memory, and container restart counts across every pod in your cluster in real time.

When memory usage climbs toward a container’s configured limit, Middleware flags it before the kernel kills the process. When an OOMKilled event does occur, Middleware ties the termination to the container’s memory trend, restart history, and cluster events so you have the full context needed to diagnose and fix the problem fast.

  • Real-time memory tracking across every pod and namespace
  • Automatic OOMKilled detection with container restart correlation
  • Trend-based alerts that fire before the limit is breached
  • Unified metrics, logs, and traces in a single dashboard
  • OpsAI: Middleware’s AI SRE agent that auto-detects memory anomalies, correlates them with upstream changes, and surfaces root-cause recommendations without requiring a manual investigation

Related: See how OpsAI handles production incidents automatically in our guide to reducing alert noise with OpsAI.

Stop diagnosing OOMKilled from scratch. Middleware gives you memory trends, restart history, and OpsAI-powered root cause analysis in one place. Start monitoring your Kubernetes workloads with Middleware

FAQs

Can OOMKilled happen even if I do not set a memory limit?

A container without a memory limit will not be OOMKilled by a container-level breach because no limit exists to breach. It can still be OOMKilled at the node level if total memory consumption exhausts the node’s physical memory. BestEffort containers (which have no requests or limits defined) are the first targets the OOM Killer selects under node memory pressure.

Does exit code 137 always mean OOMKilled?

No. Exit code 137 is produced whenever the kernel sends SIGKILL to a process, not exclusively when a container exceeds its memory limit. A container killed by a liveness probe failure, a manual docker kill, or a node shutdown also produces exit code 137. OOMKilled is confirmed only when exit code 137 appears alongside Reason: OOMKilled in the container’s last state.

How can I tell if a container was OOMKilled?

The definitive confirmation is Reason: OOMKilled in the lastState.terminated field of the container status, visible via kubectl describe pod. This field persists even after the container has restarted and is running again. A rising restart count alongside Reason: OOMKilled confirms the container is hitting its memory limit repeatedly.

Will Kubernetes automatically restart my container after an OOMKilled event?

Kubernetes restarts an OOMKilled container according to the pod’s restartPolicy, which defaults to Always for most workloads. Automatic restart does not fix the underlying cause. A container restarted after OOMKilled will be OOMKilled again if the memory pressure is not resolved.

What happens to in-flight requests when a container is OOMKilled?

In-flight requests are dropped immediately when a container is OOMKilled. The kernel sends SIGKILL with no warning, giving the application no opportunity to finish processing or return a response to the client. Unlike a graceful shutdown triggered by SIGTERM, OOMKilled provides no window for cleanup or connection draining.

What is the Linux OOM Killer and how does Kubernetes use it?

The Linux OOM Killer is a kernel subsystem that terminates processes when the system runs out of available memory. Kubernetes uses cgroups to set memory limits at the container level. When a container exceeds its cgroup limit, the kernel invokes the OOM Killer against that container’s process directly, without evaluating other processes on the node first.

Does OOMKilled affect the entire pod or just one container?

OOMKilled terminates only the container that exceeded its memory limit, not the entire pod. Other containers in the same pod continue running, and Kubernetes restarts only the affected container. If the same container is OOMKilled repeatedly, the pod eventually enters CrashLoopBackOff, but other containers in the pod are not terminated.

How do I prevent OOMKilled during a Kubernetes rolling update?

During a rolling update, new pods start before old pods are terminated. If the cluster is already near memory capacity, the brief period where both old and new pods are running can push a node into memory pressure. To prevent this, set appropriate PodDisruptionBudgets, ensure node capacity has enough headroom for the surge, and configure maxSurge and maxUnavailable in your deployment strategy to control how many pods run concurrently during the update.

Can Horizontal Pod Autoscaler prevent OOMKilled?

HPA scales pods horizontally based on CPU or custom metrics, not memory limits. It will not prevent OOMKilled because OOMKilled is triggered by a container exceeding its configured memory limit, not by the number of pods. The right tool for memory-based right-sizing is the Vertical Pod Autoscaler (VPA), which adjusts memory requests and limits based on actual observed usage.

How does OOMKilled interact with Kubernetes probes?

Liveness and readiness probes run inside the container and are subject to the same memory constraints. If the container’s memory is exhausted, probe execution may fail before the OOM Killer fires, causing Kubernetes to restart the container due to a failed liveness probe rather than OOMKilled. Both scenarios produce exit code 137 but for different reasons. Always check Reason: OOMKilled in lastState.terminated to distinguish between a probe failure kill and a memory limit kill.

What is the difference between OOMKilled and MemoryPressure in Kubernetes?

MemoryPressure is a node condition reported by the kubelet when available memory on the node falls below a configured threshold. When MemoryPressure is active, the kubelet begins evicting BestEffort pods proactively. OOMKilled is what happens at the container level when a specific container breaches its cgroup memory limit. MemoryPressure is a warning. OOMKilled is the kernel acting after the warning was not sufficient.

How do I monitor OOMKilled events over time?

kubectl get events only retains events for one hour by default, which is insufficient for trend analysis. For historical OOMKilled tracking, you need a persistent observability platform. Middleware automatically captures and stores OOMKilled events alongside memory trend data and restart history, giving you a complete picture of how often containers are hitting their limits, which workloads are most affected, and whether the trend is worsening over time.