Grafana cheat sheet

This is a collection of tips&tricks that are useful when troubleshooting problems in the kubernetes clusters based on data from LGTM

Useful Mimir queries

Top 20 of metrics with high cardinality

# Set time range to "Last 5 minutes"
topk(20, count by (__name__)({__name__=~".+"}))

Top 10 namespaces with overallocated cpu resources

topk(10, sum by (namespace)
(kube_pod_container_resource_requests{job="integrations/kubernetes/kube-state-metrics", resource="cpu"})
- sum by (namespace) (rate(container_cpu_usage_seconds_total{}[$__rate_interval])))

Sum of overallocated cpu for containers by namespace

sum by (container)
(kube_pod_container_resource_requests{job="integrations/kubernetes/kube-state-metrics", resource="cpu", namespace=~"matrikkel.*"})
- sum by (container) (rate(container_cpu_usage_seconds_total{namespace=~"matrikkel.*"}[$__rate_interval]))

Daily amount of requests by destination app and response code

sum by (destination_app, response_code) (
increase(istio_requests_total{namespace="<namespace name>", response_code=~".*", source_app="istio-ingress-external"}[1d])

Useful Loki queries