OOM Killed: What It Means and How to Fix It
OOM killed means the Linux kernel's Out of Memory killer terminated your process because the system (or its cgroup) ran out of available memory. The kernel had two options — freeze the entire machine or kill something. It killed your process. If you're seeing exit code 137, an OOM kill is almost certainly the reason.
#How the Linux OOM killer works
Linux overcommits memory by default. When a process calls malloc(), the kernel hands back a virtual address range without actually reserving physical pages. It assumes most processes won't use everything they request. This works well until too many processes actually touch their allocated memory at the same time, and the kernel has no free pages left to back the promises it made.
At that point, the OOM killer activates. It scores every running process, picks the one with the highest score, and sends it SIGKILL (signal 9). No warning, no chance to clean up. The process dies instantly.
How processes get scored
Every process has an OOM score visible at /proc/<pid>/oom_score. The score is roughly proportional to the percentage of physical memory the process consumes — a process using 10% of RAM gets a score around 100, one using 50% gets around 500. The range is 0 to 1000.
The kernel also factors in whether the process is privileged (root processes score slightly lower) and how the administrator has tuned the score via oom_score_adj.
# Check a process's OOM score
cat /proc/$(pidof my-app)/oom_score
# Check its adjustment value
cat /proc/$(pidof my-app)/oom_score_adjThe oom_score_adj parameter ranges from -1000 to +1000. Setting it to -1000 makes the process effectively immune to the OOM killer. Setting it to +1000 makes it the first target. Only root can decrease the value.
# Protect a critical process (requires root)
echo -500 | sudo tee /proc/$(pidof postgres)/oom_score_adj
# Make a process more likely to be killed
echo 500 > /proc/$(pidof cache-warmer)/oom_score_adjMemory overcommit modes
The kernel's overcommit behavior is controlled by vm.overcommit_memory:
- 0 (default) — Heuristic overcommit. The kernel guesses whether a memory allocation is reasonable and allows most requests, even if physical memory is not fully available.
- 1 — Always overcommit. Every
malloc()succeeds regardless of available memory. The OOM killer becomes the only safety net. - 2 — Never overcommit. The kernel refuses allocations that exceed physical RAM plus swap times a configurable ratio. Processes get clean allocation failures instead of OOM kills — but applications that rely on overcommit will break.
# Check current overcommit mode
cat /proc/sys/vm/overcommit_memory
# Switch to strict mode (no overcommit)
sudo sysctl vm.overcommit_memory=2Most containerized workloads run with mode 0, and memory limits are enforced via cgroups rather than overcommit settings.
#OOM killed in Docker
Docker uses Linux cgroups to enforce memory limits. When you pass --memory to docker run, the kernel creates a cgroup with a hard memory ceiling. If the container's processes collectively exceed that ceiling, the kernel's OOM killer terminates the offending process inside the cgroup — not on the host.
# Run with a 512MB memory limit
docker run --memory=512m --memory-swap=512m my-appWhen --memory-swap equals --memory, the container gets no swap space. If you omit --memory-swap, the container can use swap equal to its memory limit (so 512m memory + 512m swap = 1024m total). Setting --memory-swap=-1 gives unlimited swap, which defeats the purpose of memory limits in most cases.
Confirming a Docker OOM kill
# Did the OOM killer get this container?
docker inspect my-container --format='{{.State.OOMKilled}}'
# true
# Full state output
docker inspect my-container --format='{{json .State}}' | jqThe inspect output looks like this when OOM is the cause:
{
"Status": "exited",
"Running": false,
"OOMKilled": true,
"ExitCode": 137
}If OOMKilled is false but the exit code is still 137, something else sent SIGKILL — docker kill, a health check timeout, or host-level memory pressure killed the container from the outside.
Monitoring container memory
# Live memory usage for all containers
docker stats
# One-shot snapshot
docker stats --no-stream
# Host kernel logs showing OOM events
dmesg | grep -i "oom\|killed process"The dmesg output for an OOM kill looks roughly like this:
[123456.789] my-app invoked oom-killer: gfp_mask=0xcc0, order=0
[123456.790] Memory cgroup out of memory: Killed process 4521 (node)
total-vm:1548672kB, anon-rss:524288kB, file-rss:12288kB
That anon-rss value is the resident memory at the time of the kill. Compare it to your --memory limit.
#OOM killed in Kubernetes
Kubernetes wraps Docker's (or containerd's) cgroup limits with its own resource model. You declare resources.requests and resources.limits in your pod spec, and the kubelet translates those into cgroup constraints.
apiVersion: v1
kind: Pod
metadata:
name: my-app
spec:
containers:
- name: app
image: my-app:latest
resources:
requests:
memory: "256Mi"
limits:
memory: "512Mi"When the container exceeds 512Mi, the cgroup OOM killer terminates it. Kubernetes detects the SIGKILL, marks the container as OOMKilled, and — depending on your restartPolicy — restarts it. This is where the CrashLoopBackOff spiral begins if the process immediately consumes the same amount of memory on restart.
Diagnosing a Kubernetes OOM kill
# See the OOMKilled reason and exit code
kubectl describe pod my-appThe relevant section in the output:
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Wed, 05 Mar 2026 10:23:00 +0000
Finished: Wed, 05 Mar 2026 10:24:12 +0000
More useful commands:
# Logs from the previous container instance (before OOM)
kubectl logs my-app --previous
# Current memory usage per container
kubectl top pod my-app --containers
# Node-level memory pressure
kubectl describe node $(kubectl get pod my-app -o jsonpath='{.spec.nodeName}') | grep -A5 "Conditions"
# All OOM events in the cluster
kubectl get events --field-selector reason=OOMKilling --sort-by='.lastTimestamp'QoS classes and eviction priority
Kubernetes assigns a Quality of Service class to each pod based on how you define resources. This determines the order in which pods get killed when the node itself is under memory pressure — distinct from a container hitting its own limit.
Guaranteed — Requests equal limits for every container. Gets oom_score_adj of -998. Last to be evicted under node pressure.
Burstable — Requests are set but lower than limits. Gets a calculated oom_score_adj between 2 and 999 based on the ratio. Evicted after BestEffort.
BestEffort — No requests or limits defined. Gets oom_score_adj of 1000. First to die when the node runs low.
# Check a pod's QoS class
kubectl get pod my-app -o jsonpath='{.status.qosClass}'If your pods keep getting OOM killed and they are BestEffort, adding even minimal resource requests changes their eviction priority significantly. The jump from oom_score_adj=1000 to oom_score_adj=-998 is the difference between being the first and last process the kernel targets.
#Finding which process was OOM killed
The kernel logs every OOM kill event. Where you find those logs depends on the system.
The kernel log entry names the killed process, its PID, and the memory stats at the time of death. It also dumps a table of all running processes and their memory consumption leading up to the kill — this table is your best diagnostic tool because it shows exactly what was consuming memory across the system.
For containers, the cgroup path in the log identifies which container was involved. In Kubernetes, the cgroup path includes the pod UID, making it traceable back to a specific pod.
#How to fix and prevent OOM kills
Rule out memory leaks first
Raising limits is the obvious fix, but it just delays the crash if the application leaks memory. Profile first, then set limits.
Node.js — V8's heap grows until the OS kills the process unless you cap it. The --max-old-space-size flag sets a hard ceiling on the V8 heap. Without it, a Node.js process in a container with 512MB will happily try to allocate 1.5GB.
node --max-old-space-size=384 app.jsPython — tracemalloc tracks allocations back to source lines. Common leaks: global lists that accumulate records, Django querysets that never get evaluated, and C extensions allocating outside Python's allocator.
import tracemalloc
tracemalloc.start()
# ... run your workload ...
snapshot = tracemalloc.take_snapshot()
for stat in snapshot.statistics("lineno")[:10]:
print(stat)Java — The JVM needs memory well beyond the heap. Metaspace, thread stacks, JIT compiler buffers, and GC overhead can consume 25-30% of total memory. Never set -Xmx equal to the container memory limit.
# Let the JVM calculate heap based on the cgroup limit
java -XX:MaxRAMPercentage=75.0 -jar app.jarGo — Since Go 1.19, GOMEMLIMIT tells the runtime to GC aggressively before hitting a hard limit. Set it to 80-90% of the container memory limit and the garbage collector will work harder to stay within bounds.
GOMEMLIMIT=400MiB ./my-appSee the Node.js deploy guide, Python deploy guide, and Go deploy guide for full configuration details.
Set container memory limits
Running containers without memory limits is asking for trouble. A single process can consume all available memory on a node, affecting every other workload.
Docker:
docker run -d \
--memory=512m \
--memory-swap=512m \
--name my-app \
my-app:latestKubernetes:
resources:
requests:
memory: "256Mi" # What the app normally uses (scheduler guarantee)
limits:
memory: "512Mi" # Hard ceiling — set to 1.5-2x the requestSet requests to your application's steady-state memory consumption. Set limits to accommodate spikes with some headroom. If requests and limits are equal, the pod gets Guaranteed QoS — lowest eviction priority.
Right-size your limits with data
Guessing memory limits leads to either wasted resources or OOM kills. Measure actual usage under load, then add headroom.
# Docker — watch peak memory over time
docker stats --format "table {{.Name}}\t{{.MemUsage}}\t{{.MemPerc}}"
# Kubernetes — current usage (requires metrics-server)
kubectl top pod --containers
# Prometheus query for peak memory over 24h
max_over_time(container_memory_usage_bytes{pod="my-app"}[24h])Run your application under realistic load (not just startup), observe the peak, and set limits to 1.5x that peak. For JVM applications, account for non-heap memory by adding 30% above max heap.
Configure application-level memory limits
The container cgroup limit is the last line of defense. Application-level limits give the runtime a chance to GC, shed load, or fail gracefully before the kernel steps in.
| Runtime | Flag | Effect |
|---|---|---|
| Node.js | --max-old-space-size=384 | Caps V8 heap at 384MB |
| Java | -XX:MaxRAMPercentage=75.0 | Sets heap to 75% of cgroup limit |
| Go | GOMEMLIMIT=400MiB | Triggers aggressive GC at 400MB |
| Python | No built-in limit | Use monitoring + process managers |
| .NET | System.GC.HeapHardLimit | Hard heap ceiling |
Use multi-stage Docker builds
Dev dependencies, build tools, and package managers left in the final image don't directly cause OOM, but they add memory overhead at runtime if they load shared libraries or background processes. Multi-stage builds keep the production image lean. Our Docker deploy guide covers this in detail.
#OOM killed vs exit code 137
They are the same event seen from different perspectives. The OOM killer is the cause; exit code 137 is the symptom.
When the kernel's OOM killer sends SIGKILL to a process, the process exits with code 137 (calculated as 128 + signal 9). Docker reports OOMKilled: true in the container state. Kubernetes sets the termination reason to OOMKilled. The dmesg log shows Killed process <pid>.
Not every exit code 137 is an OOM kill, though. Any SIGKILL — manual kill -9, docker kill, a forced pod deletion — also produces exit code 137. The distinction matters for debugging. Check docker inspect for the OOMKilled flag, or kubectl describe pod for the OOMKilled reason, to confirm memory was the cause. For a deep dive on all the sources of exit code 137, see the companion article on exit code 137.
Auto-deploy into your own cloud
Push code, AZIN handles the rest. Auto-detected builds, your cloud account, no vendor lock-in.