Set up uptime monitoring
What you’ll accomplish
Section titled “What you’ll accomplish”Monitor HTTPS endpoint availability using Blackbox Exporter probes. Probes are configured through the UDS Package CR’s uptime block. The operator automatically creates Prometheus Probe resources and configures Blackbox Exporter. You can monitor simple health checks, custom paths, and even Authservice-protected applications without additional setup.
Prerequisites
Section titled “Prerequisites”- UDS CLI installed
- Access to a Kubernetes cluster with UDS Core deployed
- An application exposed via the
PackageCRexposeblock
Before you begin
Section titled “Before you begin”-
Add uptime checks to a
PackageCRAdd
uptime.checks.pathsto anexposeentry in yourPackageCR. This creates a Prometheus Probe that issues HTTP GET requests at a regular interval and checks for a successful (2xx) response.package.yaml apiVersion: uds.dev/v1alpha1kind: Packagemetadata:name: my-appnamespace: my-appspec:network:expose:# monitors: https://myapp.uds.dev/- service: my-apphost: myappgateway: tenantport: 8080uptime:checks:paths:- / -
(Optional) Monitor custom health endpoints
Specify multiple paths to monitor specific health endpoints on a single service.
package.yaml spec:network:expose:# monitors: https://myapp.uds.dev/health and https://myapp.uds.dev/ready- service: my-apphost: myappgateway: tenantport: 8080uptime:checks:paths:- /health- /ready -
(Optional) Monitor multiple services
Add uptime checks to multiple expose entries within a single
PackageCR to monitor several services at once.package.yaml spec:network:expose:# monitors: https://app.uds.dev/healthz, https://api.uds.dev/health,# https://api.uds.dev/ready, https://app.admin.uds.dev/- service: frontendhost: appgateway: tenantport: 3000uptime:checks:paths:- /healthz- service: apihost: apigateway: tenantport: 8080uptime:checks:paths:- /health- /ready- service: adminhost: appgateway: adminport: 8080uptime:checks:paths:- / -
(Optional) Monitor Authservice-protected applications
For applications protected by Authservice, add
uptime.checksto the expose entry as normal. The UDS Operator detects theenableAuthserviceSelectoron the matching SSO entry and automatically:- Creates a Keycloak service account client (
<clientId>-probe) with an audience mapper scoped to the application’s SSO client - Configures the Blackbox Exporter with an OAuth2 module that obtains a token via client credentials before probing
No additional configuration is required beyond adding
uptime.checks.paths:package.yaml apiVersion: uds.dev/v1alpha1kind: Packagemetadata:name: my-appnamespace: my-appspec:sso:- name: My AppclientId: uds-my-appredirectUris:- "https://myapp.uds.dev/login"enableAuthserviceSelector:app: my-appnetwork:expose:- service: my-apphost: myappgateway: tenantport: 8080uptime:checks:paths:- /healthzThe operator matches the expose entry to the SSO entry via the redirect URI origin (
https://myapp.uds.dev) and configures the probe to authenticate transparently through Authservice. - Creates a Keycloak service account client (
-
Deploy your Package
(Recommended) Include the
PackageCR in your Zarf package and create/deploy. See Packaging applications for general packaging guidance.Terminal window uds zarf package create --confirmuds zarf package deploy zarf-package-*.tar.zst --confirmOr apply the
PackageCR directly for quick testing:Terminal window uds zarf tools kubectl apply -f package.yaml
Verification
Section titled “Verification”Confirm uptime monitoring is working:
- Open Grafana and navigate to Dashboards then UDS / Monitoring / Probe Uptime to see the uptime dashboard
- The dashboard displays uptime status timeline, percentage uptime, and TLS certificate expiration dates
- Query
probe_successin Grafana Explore to check individual probe status
Available metrics
Section titled “Available metrics”Blackbox Exporter provides the following key metrics for alerting and dashboarding:
| Metric | Description |
|---|---|
probe_success | Whether the probe succeeded (1) or failed (0) |
probe_duration_seconds | Total probe duration |
probe_http_status_code | HTTP response status code |
probe_ssl_earliest_cert_expiry | SSL certificate expiration timestamp |
Example PromQL queries:
# Check all probes and their success statusprobe_success
# Check if a specific endpoint is upprobe_success{instance="https://myapp.uds.dev/health"}Troubleshooting
Section titled “Troubleshooting”Problem: Probe showing as failed
Section titled “Problem: Probe showing as failed”Symptom: The uptime dashboard shows a probe in a failed state.
Solution: Verify the endpoint is reachable from within the cluster. Check application health and any network policies that might block the probe.
Problem: Probe not appearing
Section titled “Problem: Probe not appearing”Symptom: No probe data shows up in Grafana after applying the Package CR.
Solution: Verify uptime.checks.paths is set in the expose entry. Check Package CR status:
uds zarf tools kubectl describe package <name> -n <namespace>Problem: Authservice-protected probe failing
Section titled “Problem: Authservice-protected probe failing”Symptom: Probe returns authentication errors for an SSO-protected application.
Solution: Check that the probe Keycloak client was created by reviewing operator logs. Verify the SSO entry’s redirect URI origin matches the expose entry’s FQDN.
Related documentation
Section titled “Related documentation”- Prometheus: Blackbox Exporter: upstream project documentation
- Prometheus Operator: Probe API: Probe CRD field reference
- Create metric alerting rules: Create custom alerts beyond the UDS Core default probe alerts.
- Monitoring & Observability concepts: Background on how the monitoring stack fits together in UDS Core.
- Monitoring & Observability reference: Default probes, recording rules, and how to disable built-in uptime probes.