Common Queries
CPU usage per pod
sum(rate(container_cpu_usage_seconds_total{namespace="default"}[5m])) by (pod)
Memory usage per pod
sum(container_memory_working_set_bytes{namespace="default"}) by (pod)
HTTP error rate
sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))
95th percentile latency
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
Node disk usage %
100 - ((node_filesystem_avail_bytes{mountpoint="/"} * 100) / node_filesystem_size_bytes{mountpoint="/"})
Pod restarts in last hour
increase(kube_pod_container_status_restarts_total[1h]) > 0
Node CPU utilization
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
Requests per second
sum(rate(http_requests_total[1m])) by (job)
Function Reference
rate(v range-vector)
Per-second average rate of increase for counters.
irate(v range-vector)
Per-second instant rate based on last two samples.
sum(v) by (label)
Aggregate values, grouped by label.
avg / min / max / count
Other aggregation operators, same syntax as sum.
increase(v range-vector)
Total increase of a counter over the time range.
histogram_quantile(p, v)
Calculate a quantile (e.g. 0.95) from histogram buckets.
topk(n, v) / bottomk(n, v)
Top/bottom N series by value.
label_replace(v, dst, repl, src, regex)
Rewrite or add a label based on regex match.