A container is running — that does not mean the application is working. There might be a deadlock, memory overflow, or slow initialisation. Probes are the K8s mechanism for checking application health.
Why Probes Are Needed
Without probes, K8s considers a Pod alive as long as the process inside has not exited. The problem:
- The application is running but not responding (deadlock, hang)
- The application is not yet ready to accept traffic (loading data, establishing connections)
- The application starts slowly and K8s kills it before it finishes initialising
Three probe types solve three different problems.
livenessProbe: Restart If Dead
liveness checks: “Is the application alive?” If the check fails N times in a row, K8s restarts the container.
Use when: the application may deadlock or hang and you need an automatic restart.
readinessProbe: Don’t Send Traffic If Not Ready
readiness checks: “Is the application ready to accept requests?” If not, the Pod is removed from the Service’s load-balancing pool and receives no traffic.
The Pod stays in the Running state but gets no traffic. As soon as the check passes, the Pod is returned to rotation.
Use when: the application is initialising (connecting to the database, warming caches, loading configs).
startupProbe: For Slow Startup
startup gives the application time to initialise. While startupProbe has not yet passed, liveness and readiness probes do not run. Once it succeeds, startupProbe is disabled.
Use when: the application takes a long time to start (legacy systems, JVM, large ML models).
Three Probe Types
HTTP Check (most common)
livenessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: X-Health-Check
value: "true"
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
successThreshold: 1
K8s sends a GET request to /healthz. HTTP 2xx/3xx = success; anything else = failure.
TCP Check
livenessProbe:
tcpSocket:
port: 5432
initialDelaySeconds: 15
periodSeconds: 10
Checks that the port accepts TCP connections. Convenient for databases and other non-HTTP services.
Exec Check (Command Inside the Container)
livenessProbe:
exec:
command:
- sh
- -c
- "redis-cli ping | grep -q PONG"
initialDelaySeconds: 10
periodSeconds: 5
Runs a command inside the container. Exit code 0 = success.
Probe Parameters
| Parameter | Meaning | Default |
|---|---|---|
initialDelaySeconds |
Wait N seconds after container start | 0 |
periodSeconds |
How often to check | 10 |
timeoutSeconds |
Timeout for one check | 1 |
failureThreshold |
Consecutive failures before action | 3 |
successThreshold |
Consecutive successes needed for recovery | 1 |
Practical Example: FastAPI Application
# main.py — health check endpoints
from fastapi import FastAPI
import asyncpg
import os
app = FastAPI()
db_pool = None
@app.on_event("startup")
async def startup():
global db_pool
db_pool = await asyncpg.create_pool(os.getenv("DATABASE_URL"))
@app.get("/healthz")
async def liveness():
"""Application is alive — process is running"""
return {"status": "ok"}
@app.get("/readyz")
async def readiness():
"""Application is ready — database is connected"""
try:
async with db_pool.acquire() as conn:
await conn.fetchval("SELECT 1")
return {"status": "ready"}
except Exception:
from fastapi import HTTPException
raise HTTPException(status_code=503, detail="Database unavailable")
# deployment-with-probes.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: fastapi-app
spec:
replicas: 3
selector:
matchLabels:
app: fastapi-app
template:
metadata:
labels:
app: fastapi-app
spec:
containers:
- name: app
image: fastapi-app:1.0
ports:
- containerPort: 8000
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
# Give time for database connection
startupProbe:
httpGet:
path: /readyz
port: 8000
failureThreshold: 30 # 30 * 5s = 150 seconds maximum
periodSeconds: 5
# Check that the application is alive
livenessProbe:
httpGet:
path: /healthz
port: 8000
initialDelaySeconds: 0
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3
# Check readiness to accept traffic
readinessProbe:
httpGet:
path: /readyz
port: 8000
initialDelaySeconds: 0
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
successThreshold: 1
Examples for nginx and PostgreSQL
# nginx: liveness via TCP
livenessProbe:
tcpSocket:
port: 80
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 5
# PostgreSQL: check via pg_isready
livenessProbe:
exec:
command:
- pg_isready
- -U
- postgres
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 6
readinessProbe:
exec:
command:
- pg_isready
- -U
- postgres
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
Debugging Probe Issues
# View pod events (reasons for restarts)
kubectl describe pod my-app-abc123
# Events:
# Warning Unhealthy Liveness probe failed: HTTP probe failed...
# Warning BackOff Back-off restarting failed container
# Restart count
kubectl get pods
# NAME READY STATUS RESTARTS AGE
# my-app-abc123 0/1 Running 5 10m ← problem
# Logs from the previous (crashed) container
kubectl logs my-app-abc123 --previous
Common mistakes:
initialDelaySecondsis too small — probe fires before the application is readytimeoutSecondsis too small — a slow endpoint does not respond in time- liveness and readiness share the same slow endpoint — K8s kills an overloaded pod
- No startupProbe for a slow application — liveness kills it during startup
💬 Comments (0)
No comments yet
Be the first to share your opinion about this article!