Skip to content
You're viewing docs for v1.0.Go to the latest

Create metric alerting rules

Define alerting conditions based on Prometheus metrics using PrometheusRule CRDs. Alerts are automatically picked up by the Prometheus Operator and routed to Alertmanager.

  • UDS CLI installed
  • UDS Registry account created and authenticated locally with a read token
  • Access to a Kubernetes cluster with UDS Core deployed
  • Familiarity with PromQL

UDS Core ships default alerting rules from the upstream kube-prometheus-stack chart covering cluster health, node conditions, and platform components. Runbooks for these default rules are available at runbooks.prometheus-operator.dev. This guide covers creating custom rules for your applications and optionally tuning the defaults.

  1. Create a PrometheusRule

    Define a PrometheusRule custom resource containing one or more alert rules. The Prometheus Operator watches for these CRs and loads them into Prometheus automatically.

    my-app-alerts.yaml
    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
    name: my-app-alerts
    namespace: my-app
    spec:
    groups:
    - name: my-app
    rules:
    - alert: PodRestartingFrequently
    expr: increase(kube_pod_container_status_restarts_total[1h]) > 5
    for: 5m
    labels:
    severity: warning
    annotations:
    summary: "Pod {{ $labels.pod }} is restarting frequently"
    runbook: "https://example.com/runbooks/pod-restart"
    description: "Pod restarted {{ $value }} times in the last hour"
    - alert: HighMemoryUsage
    expr: |
    (container_memory_working_set_bytes / on(namespace, pod, container) kube_pod_container_resource_limits{resource="memory"}) * 100 > 80
    for: 15m
    labels:
    severity: warning
    annotations:
    summary: "High memory usage detected"
    runbook: "https://example.com/runbooks/high-memory-usage"
    description: "Container using {{ $value }}% of memory limit"

    Key fields in each rule:

    • expr: PromQL expression that defines the alerting condition. When this expression returns results, the alert becomes active.
    • for: How long the condition must be continuously true before the alert fires. Prevents flapping on transient spikes.
    • labels.severity: Used by Alertmanager for routing. Common values are critical, warning, and info.
    • annotations: Human-readable context attached to the alert. Include a summary, description, and runbook URL to make alerts actionable.
  2. Deploy the rule

    (Recommended) Include the PrometheusRule in your Zarf package and create/deploy. See Packaging applications for general packaging guidance.

    Terminal window
    uds zarf package create --confirm
    uds zarf package deploy zarf-package-*.tar.zst --confirm

    Or apply the PrometheusRule directly for quick testing:

    Terminal window
    uds zarf tools kubectl apply -f my-app-alerts.yaml

    The Prometheus Operator picks up PrometheusRule CRs automatically.

  3. Optional: Disable or tune default alert rules

    If default kube-prometheus-stack alerts are too noisy or not relevant to your environment, you can disable individual rules or entire rule groups through bundle overrides.

    uds-bundle.yaml
    packages:
    - name: core
    repository: registry.defenseunicorns.com/public/core
    ref: x.x.x-upstream
    overrides:
    kube-prometheus-stack:
    kube-prometheus-stack:
    values:
    # Disable specific individual rules by name
    - path: defaultRules.disabled
    value:
    KubeControllerManagerDown: true
    KubeSchedulerDown: true
    # Disable entire rule groups with boolean toggles
    - path: defaultRules.rules.kubeControllerManager
    value: false
    - path: defaultRules.rules.kubeSchedulerAlerting
    value: false

    Use defaultRules.disabled for fine-tuned control over individual rules. Use defaultRules.rules.* to disable entire rule groups when broader changes are needed.

    Create and deploy your bundle:

    Terminal window
    uds create <path-to-bundle-dir>
    uds deploy uds-bundle-<name>-<arch>-<version>.tar.zst

Open Grafana and navigate to Alerting > Alert rules. Filter by the Prometheus datasource. Confirm your custom rules appear in the list.

Check the rule state to understand its current status:

  • Inactive: condition is not met
  • Pending: condition is met but the for duration has not elapsed
  • Firing: active alert being sent to Alertmanager

Symptom: Custom alert rules do not show up in the Grafana alerting UI.

Solution: Verify the PrometheusRule CR was created successfully and check for YAML syntax errors:

Terminal window
uds zarf tools kubectl get prometheusrule -A
uds zarf tools kubectl describe prometheusrule <name> -n <namespace>

Symptom: The PromQL expression should match, but the alert stays in Inactive state.

Solution: Verify the PromQL expression returns results in the Prometheus UI:

Terminal window
uds zarf connect prometheus

Navigate to the Graph tab and run your expr query directly. If it returns results, check that the for duration has elapsed, because the alert will remain in Pending state until the condition is continuously true for that period.