Create metric alerting rules
What you’ll accomplish
Section titled “What you’ll accomplish”Define alerting conditions based on Prometheus metrics using PrometheusRule CRDs. Alerts are automatically picked up by the Prometheus Operator and routed to Alertmanager.
Prerequisites
Section titled “Prerequisites”- UDS CLI installed
- UDS Registry account created and authenticated locally with a read token
- Access to a Kubernetes cluster with UDS Core deployed
- Familiarity with PromQL
Before you begin
Section titled “Before you begin”UDS Core ships default alerting rules from two sources. The upstream kube-prometheus-stack chart provides cluster and node health alerts, and UDS Core provides default probe alerts for endpoint downtime and TLS certificate expiry.
Runbooks for upstream defaults are available at runbooks.prometheus-operator.dev. This guide covers creating custom rules for your applications and optionally tuning either default set.
-
Create a PrometheusRule
Define a
PrometheusRulecustom resource containing one or more alert rules. The Prometheus Operator watches for these CRs and loads them into Prometheus automatically.my-app-alerts.yaml apiVersion: monitoring.coreos.com/v1kind: PrometheusRulemetadata:name: my-app-alertsnamespace: my-appspec:groups:- name: my-apprules:- alert: PodRestartingFrequentlyexpr: increase(kube_pod_container_status_restarts_total[1h]) > 5for: 5mlabels:severity: warningannotations:summary: "Pod {{ $labels.pod }} is restarting frequently"runbook: "https://example.com/runbooks/pod-restart"description: "Pod restarted {{ $value }} times in the last hour"- alert: HighMemoryUsageexpr: |(container_memory_working_set_bytes / on(namespace, pod, container) kube_pod_container_resource_limits{resource="memory"}) * 100 > 80for: 15mlabels:severity: warningannotations:summary: "High memory usage detected"runbook: "https://example.com/runbooks/high-memory-usage"description: "Container using {{ $value }}% of memory limit"Key fields in each rule:
expr: PromQL expression that defines the alerting condition. When this expression returns results, the alert becomes active.for: How long the condition must be continuously true before the alert fires. Prevents flapping on transient spikes.labels.severity: Used by Alertmanager for routing. Common values arecritical,warning, andinfo.annotations: Human-readable context attached to the alert. Include asummary,description, andrunbookURL to make alerts actionable.
-
Deploy the rule
(Recommended) Include the PrometheusRule in your Zarf package and create/deploy. See Packaging applications for general packaging guidance.
Terminal window uds zarf package create --confirmuds zarf package deploy zarf-package-*.tar.zst --confirmOr apply the PrometheusRule directly for quick testing:
Terminal window uds zarf tools kubectl apply -f my-app-alerts.yamlThe Prometheus Operator picks up PrometheusRule CRs automatically.
-
(Optional) Disable or tune default alert rules
If default alerts are too noisy or not relevant to your environment, you can tune both upstream kube-prometheus-stack and UDS Core defaults through bundle overrides.
UDS Core default probe alerts can be tuned or disabled as follows:
uds-bundle.yaml overrides:kube-prometheus-stack:uds-prometheus-config:values:# Disable all UDS Core probe default alerts- path: udsCoreDefaultAlerts.enabledvalue: false# Disable Endpoint Down alert- path: udsCoreDefaultAlerts.probeEndpointDown.enabledvalue: false# Tune threshold and severity for TLS expiry warning alerts- path: udsCoreDefaultAlerts.probeTLSExpiryWarning.daysvalue: 21- path: udsCoreDefaultAlerts.probeTLSExpiryWarning.severityvalue: warning# Tune threshold and severity for TLS expiry critical alerts- path: udsCoreDefaultAlerts.probeTLSExpiryCritical.daysvalue: 7- path: udsCoreDefaultAlerts.probeTLSExpiryCritical.severityvalue: criticalUpstream kube-prometheus-stack default rules can be disabled as follows:
uds-bundle.yaml overrides:kube-prometheus-stack:kube-prometheus-stack:values:# Disable specific individual rules by name- path: defaultRules.disabledvalue:KubeControllerManagerDown: trueKubeSchedulerDown: true# Disable entire rule groups with boolean toggles- path: defaultRules.rules.kubeControllerManagervalue: false- path: defaultRules.rules.kubeSchedulerAlertingvalue: falseUse
defaultRules.disabledfor fine-tuned control over upstream individual rules. UsedefaultRules.rules.*to disable upstream rule groups when broader changes are needed.Create and deploy your bundle:
Terminal window uds create <path-to-bundle-dir>uds deploy uds-bundle-<name>-<arch>-<version>.tar.zst
Verification
Section titled “Verification”Open Grafana and navigate to Alerting > Alert rules. Filter by the Prometheus datasource. Confirm your custom rules appear in the list.
Check the rule state to understand its current status:
- Inactive: condition is not met
- Pending: condition is met but the
forduration has not elapsed - Firing: active alert being sent to Alertmanager
Troubleshooting
Section titled “Troubleshooting”Rule not appearing in Grafana
Section titled “Rule not appearing in Grafana”Symptom: Custom alert rules do not show up in the Grafana alerting UI.
Solution: Verify the PrometheusRule CR was created successfully and check for YAML syntax errors:
uds zarf tools kubectl get prometheusrule -Auds zarf tools kubectl describe prometheusrule <name> -n <namespace>Alert not firing when expected
Section titled “Alert not firing when expected”Symptom: The PromQL expression should match, but the alert stays in Inactive state.
Solution: Verify the PromQL expression returns results in the Prometheus UI:
uds zarf connect prometheusNavigate to the Graph tab and run your expr query directly. If it returns results, check that the for duration has elapsed, because the alert will remain in Pending state until the condition is continuously true for that period.
Related documentation
Section titled “Related documentation”- Prometheus: Alerting rules - PromQL alerting rule syntax
- Prometheus: Alerting best practices - guidance on alert design
- Prometheus Operator: PrometheusRule API - full CRD field reference
- Default rule runbooks - troubleshooting guides for kube-prometheus-stack alerts
- Route alerts to notification channels - Configure Alertmanager to deliver your alerts to Slack, PagerDuty, or email.
- Create log-based alerting and recording rules - Complement metric alerts with log pattern detection using Loki Ruler.