Cluster Health Dashboard

The dashboard is the first screen you see after logging into the Skipper web console. It provides a real-time overview of your cluster's health, showing the status of every infrastructure component, node resource usage, and any recent stability issues.

What the dashboard shows

Component status

The dashboard monitors 13 infrastructure components that make up a Skipper cluster:

Component	What it does
k3s	Lightweight Kubernetes distribution
Traefik	Ingress controller and reverse proxy
cert-manager	Automatic TLS certificate management
Longhorn	Distributed block storage
KEDA	Event-driven autoscaling
Loki	Log aggregation
Promtail	Log collection agent
Prometheus	Metrics collection and alerting
Grafana	Metrics visualisation
Velero	Cluster backup and restore
Dex	Identity and authentication provider
Console API	Backend API for the web console
Console	The web console frontend

Each component shows a green or red indicator:

Green means the component has at least one healthy replica running.
Red means the component is not found or has zero available replicas.

Node information

The nodes table shows every node in your cluster with:

Status: whether the node is Ready or NotReady.
CPU usage: current CPU consumption reported by metrics-server.
Memory usage: current memory consumption reported by metrics-server.
Disk usage: reported when available, otherwise shown as "n/a".

CPU and memory values require metrics-server to be installed (it is included in a standard Skipper installation). If metrics-server is unavailable, the values display as "n/a".

OOM-killed pods

An amber warning banner appears if any pod was terminated due to running out of memory (OOMKilled) in the last 24 hours. Each entry shows the pod name, namespace, and the time the kill occurred.

OOM kills typically indicate that an application needs more memory than its resource limit allows. To resolve this:

Open the app's resource settings in the console or run kip resources set.
Increase the memory limit.
Monitor the dashboard to confirm the kills stop.

Node resource pressure

Two cards show real-time memory and CPU utilisation for the cluster:

Utilisation bar: colour-coded green (<70%), amber (70-85%), or red (>85%)
Sparkline chart: a trend line showing the last hour of usage at a glance
Totals: current usage vs allocatable capacity

When memory exceeds 80%, the resource controller generates a warning alert with the top consumers and any anomalies. At 90%+, alerts are marked critical.

Workload memory trends

A table showing every Skipper-managed workload with:

Current memory: how much the workload is using right now
Sparkline: a mini trend chart of the last hour. Blue means stable, amber means growing, red means anomaly
Change indicator: percentage growth compared to ~10 minutes ago. Workloads that grew more than 30% are flagged as anomalies and highlighted in red

Anomalies are sorted to the top of the list, making it easy to spot a workload that is leaking memory or experiencing unexpected load growth before the node runs out of resources.

Resource management mode

The dashboard displays the current resource management mode (Auto or Expert) in the header area. In auto mode, the resource controller runs in the background and adjusts CPU/memory for your apps based on actual usage. A summary of recent auto-mode changes is shown on the dashboard when available. See Resource Management for details on how the controller works and how to switch between modes.

Auto-refresh

The dashboard refreshes automatically every 30 seconds. You can also click the Refresh button to fetch the latest data immediately.

Troubleshooting

If a component shows red, check the following:

Verify the component is deployed. Run kubectl get pods -n <namespace> to see if the pod exists.
Check pod logs. Run kubectl logs -n <namespace> <pod-name> for error messages.
Check events. Run kubectl get events -n <namespace> --sort-by=.lastTimestamp for recent issues.
Restart the component. Run kubectl rollout restart deployment/<name> -n <namespace> if the pod is stuck.

If all components show red and you have just installed the cluster, wait a few minutes for everything to start. Initial provisioning can take 2-5 minutes depending on your server.

Cluster Health Dashboard ​

What the dashboard shows ​

Component status ​

Node information ​

OOM-killed pods ​

Node resource pressure ​

Workload memory trends ​

Resource management mode ​

Auto-refresh ​

Troubleshooting ​