Create metric alerting rules

What you’ll accomplish

Define alerting conditions based on Prometheus metrics using PrometheusRule CRDs. Alerts are automatically picked up by the Prometheus Operator and routed to Alertmanager.

Prerequisites

UDS CLI installed
UDS Registry account created and authenticated locally with a read token
Access to a Kubernetes cluster with UDS Core deployed
Familiarity with PromQL

Before you begin

UDS Core ships default alerting rules from two sources. The upstream kube-prometheus-stack chart provides cluster and node health alerts, and UDS Core provides default probe alerts for endpoint downtime and TLS certificate expiry.

Runbooks for upstream defaults are available at runbooks.prometheus-operator.dev. This guide covers creating custom rules for your applications and optionally tuning either default set.

Steps

Create a PrometheusRule

Define a PrometheusRule custom resource containing one or more alert rules. The Prometheus Operator watches for these CRs and loads them into Prometheus automatically.

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: my-app-alerts
  namespace: my-app
spec:
  groups:
  - name: my-app
    rules:
    - alert: PodRestartingFrequently
      expr: increase(kube_pod_container_status_restarts_total[1h]) > 5
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Pod {{ $labels.pod }} is restarting frequently"
        runbook: "https://example.com/runbooks/pod-restart"
        description: "Pod restarted {{ $value }} times in the last hour"

    - alert: HighMemoryUsage
      expr: |
        (container_memory_working_set_bytes / on(namespace, pod, container) kube_pod_container_resource_limits{resource="memory"}) * 100 > 80
      for: 15m
      labels:
        severity: warning
      annotations:
        summary: "High memory usage detected"
        runbook: "https://example.com/runbooks/high-memory-usage"
        description: "Container using {{ $value }}% of memory limit"

Key fields in each rule:

expr: PromQL expression that defines the alerting condition. When this expression returns results, the alert becomes active.
for: How long the condition must be continuously true before the alert fires. Prevents flapping on transient spikes.
labels.severity: Used by Alertmanager for routing. Common values are critical, warning, and info.
annotations: Human-readable context attached to the alert. Include a summary, description, and runbook URL to make alerts actionable.

Deploy the rule

(Recommended) Include the PrometheusRule in your Zarf package and create/deploy. See Packaging applications for general packaging guidance.
Terminal window
```
uds zarf package create --confirm
uds zarf package deploy zarf-package-*.tar.zst --confirm
```
Or apply the PrometheusRule directly for quick testing:
Terminal window
```
uds zarf tools kubectl apply -f my-app-alerts.yaml
```
The Prometheus Operator picks up PrometheusRule CRs automatically.

(Optional) Disable or tune default alert rules

If default alerts are too noisy or not relevant to your environment, you can tune both upstream kube-prometheus-stack and UDS Core defaults through bundle overrides.

UDS Core default probe alerts can be tuned or disabled as follows:

overrides:
  kube-prometheus-stack:
    uds-prometheus-config:
      values:
        # Disable all UDS Core probe default alerts
        - path: udsCoreDefaultAlerts.enabled
          value: false
        # Disable Endpoint Down alert
        - path: udsCoreDefaultAlerts.probeEndpointDown.enabled
          value: false
        # Tune threshold and severity for TLS expiry warning alerts
        - path: udsCoreDefaultAlerts.probeTLSExpiryWarning.days
          value: 21
        - path: udsCoreDefaultAlerts.probeTLSExpiryWarning.severity
          value: warning
        # Tune threshold and severity for TLS expiry critical alerts
        - path: udsCoreDefaultAlerts.probeTLSExpiryCritical.days
          value: 7
        - path: udsCoreDefaultAlerts.probeTLSExpiryCritical.severity
          value: critical

Upstream kube-prometheus-stack default rules can be disabled as follows:

overrides:
  kube-prometheus-stack:
    kube-prometheus-stack:
      values:
        # Disable specific individual rules by name
        - path: defaultRules.disabled
          value:
            KubeControllerManagerDown: true
            KubeSchedulerDown: true
        # Disable entire rule groups with boolean toggles
        - path: defaultRules.rules.kubeControllerManager
          value: false
        - path: defaultRules.rules.kubeSchedulerAlerting
          value: false

Use defaultRules.disabled for fine-tuned control over upstream individual rules. Use defaultRules.rules.* to disable upstream rule groups when broader changes are needed.

Create and deploy your bundle:

uds create <path-to-bundle-dir>
uds deploy uds-bundle-<name>-<arch>-<version>.tar.zst

Verification

Open Grafana and navigate to Alerting > Alert rules. Filter by the Prometheus datasource. Confirm your custom rules appear in the list.

Check the rule state to understand its current status:

Inactive: condition is not met
Pending: condition is met but the for duration has not elapsed
Firing: active alert being sent to Alertmanager

Troubleshooting

Rule not appearing in Grafana

Symptom: Custom alert rules do not show up in the Grafana alerting UI.

Solution: Verify the PrometheusRule CR was created successfully and check for YAML syntax errors:

uds zarf tools kubectl get prometheusrule -A
uds zarf tools kubectl describe prometheusrule <name> -n <namespace>

Alert not firing when expected

Symptom: The PromQL expression should match, but the alert stays in Inactive state.

Solution: Verify the PromQL expression returns results in the Prometheus UI:

uds zarf connect prometheus

Navigate to the Graph tab and run your expr query directly. If it returns results, check that the for duration has elapsed, because the alert will remain in Pending state until the condition is continuously true for that period.

Prometheus: Alerting rules - PromQL alerting rule syntax
Prometheus: Alerting best practices - guidance on alert design
Prometheus Operator: PrometheusRule API - full CRD field reference
Default rule runbooks - troubleshooting guides for kube-prometheus-stack alerts
Route alerts to notification channels - Configure Alertmanager to deliver your alerts to Slack, PagerDuty, or email.
Create log-based alerting and recording rules - Complement metric alerts with log pattern detection using Loki Ruler.