Too many open files
When to use this runbook
Section titled “When to use this runbook”Use this runbook when:
istio-cni-nodepods crash loop on startup, orztunnelpods stay stuck inContainerCreatingvectorpods fail to start or stop collecting logs from cluster workloads- Any pod logs
too many open filesorwatcher create: too many open files
Example error from istio-cni pods:
error cni-agent installer failed: create CNI config file: watcher create: too many open filesError: create CNI config file: watcher create: too many open filesOverview
Section titled “Overview”This error means a process tried to create an inotify instance or open a file descriptor and hit a kernel limit. File watch-heavy components such as Istio CNI, ztunnel, and Vector consume inotify instances and watches quickly, so they trip these limits before most workloads do.
This is typically caused by one of the following:
- Low
fs.inotify.max_user_instanceson the host: the kernel limit oninotifyinstances per UID is exhausted, so new watches fail to register - Low
fs.inotify.max_user_watcheson the host: the per-UID watch count is exhausted, common when Vector tails many log files - Low
fs.file-maxorfs.nr_open: the global or per-process open file descriptor limits are too low for the workloads running on the node
Pre-checks
Section titled “Pre-checks”-
Confirm the symptom in pod logs
Check the logs of the affected pod for
too many open files. For example, withistio-cni:Terminal window uds zarf tools kubectl logs -n istio-system -l k8s-app=istio-cni-node --tail=50Look for: lines containing
too many open filesorwatcher create: too many open files. Their presence confirms the kernel limit is the root cause. -
Check current host limits
On the host (see the Overview note), inspect the active values:
Terminal window sysctl fs.inotify.max_user_instances fs.inotify.max_user_watches fs.file-max fs.nr_openWhat to look for: values at or above the Vector requirements baseline. Anything significantly lower indicates the host needs to be tuned.
Example of acceptable output:
Terminal window fs.inotify.max_user_instances = 1024fs.inotify.max_user_watches = 1048576fs.file-max = 13181250fs.nr_open = 13181250
Procedure
Section titled “Procedure”Apply the kernel parameters from the Vector requirements on the host running the cluster.
-
Raise the relevant
sysctlvalues on the hostOn the host (see the Overview note), run the following as root (prefix with
sudoif you are not running as root):Terminal window declare -A sysctl_settings# Max open files system-wide and per-process; raise so watch-heavy workloads don't exhaust descriptorssysctl_settings["fs.nr_open"]=13181250sysctl_settings["fs.file-max"]=13181250# inotify instances per UID; Istio CNI and ztunnel each consume severalsysctl_settings["fs.inotify.max_user_instances"]=1024# inotify watches per UID; Vector tails every log file in the clustersysctl_settings["fs.inotify.max_user_watches"]=1048576for key in "${!sysctl_settings[@]}"; dovalue="${sysctl_settings[$key]}"sysctl -w "$key=$value"echo "$key=$value" > "/etc/sysctl.d/$key.conf"donesysctl --system -
Restart affected pods
Once the host limits are raised, recycle the failing pods so they can reinitialize. For example, with
istio-cniandztunnel:Terminal window uds zarf tools kubectl rollout restart daemonset/istio-cni-node daemonset/ztunnel -n istio-systemRestart any other workloads that were logging the error.
Verification
Section titled “Verification”After applying the fix, confirm the affected pods recover. For example, with istio-cni:
uds zarf tools kubectl get pods -n istio-systemuds zarf tools kubectl logs -n istio-system -l k8s-app=istio-cni-node --tail=50 | grep -i "too many open files" || echo "no matches"Success indicators:
- Affected pods are
Runningand ready - Their logs no longer contain
too many open files
Additional help
Section titled “Additional help”If this runbook doesn’t resolve your issue:
-
Capture current host limits and Istio CNI logs:
Terminal window sysctl fs.inotify.max_user_instances fs.inotify.max_user_watches fs.file-max fs.nr_open > host-limits.txtTerminal window uds zarf tools kubectl logs -n istio-system -l k8s-app=istio-cni-node > istio-cni.log -
Check UDS Core GitHub Issues for known issues
-
Open a new issue with the captured output attached
Related documentation
Section titled “Related documentation”- Vector requirements - sysctl values known to work well for UDS Core
inotify(7)man page - upstream reference forinotifyinstances, watches, and themax_user_*limits- Linux kernel
/proc/sys/fs/documentation - upstream reference forfs.file-max,fs.nr_open, and related sysctls