● LIVE   Breaking News & Analysis
Glee21 Stack
2026-05-01
Open Source

Enhancing Deployment Reliability at GitHub: Using eBPF to Break Circular Dependencies

GitHub uses eBPF to monitor and block network calls during deployments, preventing circular dependencies that could block fixes during outages. Learn about dependency types, eBPF's role, and how to implement it.

At GitHub, we run our entire infrastructure on github.com itself, which creates a unique challenge: if the platform goes down, we can't access the very code needed to fix it. This circular dependency is a critical risk. To address it, we've turned to eBPF, a powerful kernel technology that allows us to monitor and block dangerous network calls during deployments. Below, we answer common questions about our approach, the types of circular dependencies we face, and how eBPF helps keep GitHub safe and available.

What is a circular dependency in the context of GitHub's deployment?

A circular dependency occurs when a deployment process relies on a service that is itself affected by the deployment or by an ongoing outage. For GitHub, the most obvious example is using github.com itself to download code or tools needed to fix the platform. If GitHub is down, you can't download the fix—creating a loop. This isn't just theoretical; it's a real risk when deploying to stateful hosts like MySQL nodes. For instance, a deploy script might try to pull the latest release of an open-source tool from GitHub. If GitHub is unavailable (say, due to a MySQL outage), the script fails. This direct dependency is just one of several types we've identified.

Enhancing Deployment Reliability at GitHub: Using eBPF to Break Circular Dependencies
Source: github.blog

Why is a circular dependency particularly dangerous when GitHub itself is down?

When github.com experiences an outage, our entire deployment pipeline can grind to a halt because the scripts often need to fetch assets, binaries, or configuration data from the very platform that's broken. This creates a critical chicken-and-egg problem: to restore service, we need to deploy a fix, but to deploy that fix, we need the service to be up. Without safeguards, a minor incident can snowball into a prolonged outage. That's why we maintain offline mirrors and built assets for rollback—but even these can't prevent all circular dependencies, especially hidden and transient ones. The danger is that the failure propagates, making it impossible to recover without manual, error-prone workarounds.

What are the different types of circular dependencies GitHub identified?

We categorize circular dependencies into three main types. Direct dependencies are obvious: a deploy script attempts to pull a release or binary from GitHub, failing if GitHub is down. Hidden dependencies are subtler—a tool already on the machine (like a servicing script) checks for an update online; if it can't reach GitHub, it may hang or fail. Transient dependencies involve a chain: a deploy script calls an internal API (e.g., a migration service), which in turn tries to fetch a tool from GitHub, propagating the failure back. All three can block deployments during an incident, making it critical to detect and block them proactively.

How did GitHub traditionally try to prevent circular dependencies?

Historically, the responsibility fell on each team owning stateful hosts to manually review their deployment scripts for any reliance on GitHub or other internal services that could be impacted. This was time-consuming and error-prone. Dependencies are often not obvious—hidden in third-party tools, scripts, or even library calls. As our infrastructure grew, this manual approach became unsustainable. Teams would miss dependencies, leading to failures exactly when they were most critical. We needed a systematic, automated way to enforce deployment safety without requiring every team to become experts in dependency analysis.

What is eBPF and how does it help with deployment safety?

eBPF (extended Berkeley Packet Filter) is a Linux kernel technology that allows you to run sandboxed programs within the kernel to observe or control system behavior. For deployment safety, we use eBPF to monitor network calls made by deployment scripts in real time. By attaching eBPF programs to system calls like connect() or sendto(), we can see exactly where a script tries to connect—including calls to GitHub or other services. If a connection matches a known dangerous pattern (e.g., a circular dependency), we can block it or log it for analysis. This gives us fine-grained control without modifying the scripts themselves, making it ideal for enforcing deployment policies.

Enhancing Deployment Reliability at GitHub: Using eBPF to Break Circular Dependencies
Source: github.blog

How does eBPF selectively monitor and block calls during deployment?

During a deployment, we run eBPF programs that intercept system calls like connect and send. Each call is evaluated against a set of rules defined in our deployment policy. For example, we can specify that during a critical MySQL fix, no network calls to github.com are allowed. The eBPF program checks the destination IP or domain and, if it matches a blocked pattern, the call is rejected with an error (e.g., ECONNREFUSED). Alternatively, we can allow the call but log it for auditing. This approach is non-intrusive: the deployment scripts themselves don't change, and the policy can be updated without restarting services. By running eBPF programs in the kernel, we achieve low-latency monitoring and enforcement at scale.

What were the key findings from GitHub's evaluation of eBPF for deployment safety?

After evaluating eBPF, we found it to be a highly effective solution for preventing circular dependencies. Key findings include: (1) eBPF can detect and block all three types of circular dependencies—direct, hidden, and transient—because it sees all network calls. (2) The performance overhead is minimal (under 1% CPU in our tests), so it doesn't slow down deployments. (3) Writing eBPF programs is more accessible than expected; developers with basic C knowledge can create effective filters. (4) The policy-driven approach allows us to enforce different rules for different services (e.g., stricter rules for stateful hosts). (5) eBPF gives us visibility into previously unknown dependencies, helping teams clean up their deployment scripts. Overall, we concluded eBPF is a powerful tool for improving deployment safety and reliability.

How can developers get started writing their own eBPF programs for deployment safety?

To start, you need a Linux system with kernel 4.19+ and the bpf and bcc toolkits installed. Begin by writing a simple eBPF program in C that attaches to the connect syscall. For example, you can print the destination IP of every outgoing TCP connection. Tools like bpftrace make it even easier for quick experiments. For production use, consider using a framework like libbpf or the Cilium project, which provides higher-level abstractions. GitHub's approach involved creating a custom eBPF program that matches a list of blocked domains. We recommend starting with a proof-of-concept that logs connections during a deployment, then gradually adding blocking rules. Remember to test thoroughly—incorrect eBPF programs can panic the kernel. Our documentation and examples are available in our public repositories to help you get started.