cilium/ebpf: writing the Go side of an eBPF program
April 19, 2026 · kernel notes · ebpf · go · cilium
Most production eBPF code is split into two parts: a C program that runs in the kernel, and a userspace controller that loads it, configures it, and reads its data. The Go ecosystem has converged on github.com/cilium/ebpf as the library for the controller side. It is mature, well-maintained, and the closest thing to a standard.
This is a practical tour of how to use it without the surprises.
The big picture
Your workflow looks like:
- Write a C program in
bpf/your_program.c. - Generate Go bindings via
bpf2go(a code generator that ships with cilium/ebpf). - In Go,
Loadthe compiled object, get back struct fields for each map and program. - Attach the program to an interface or hook.
- Read/write maps, consume ringbuf events, etc.
bpf2go is the integration point that makes this ergonomic. It compiles the C with clang, embeds the resulting object file as a Go variable, and generates typed accessors for every map and program defined in the C source.
Setting up bpf2go
In your Go package:
package bpf
//go:generate go run github.com/cilium/ebpf/cmd/bpf2go@latest bpf ../../bpf/xdp_counter.c
Run go generate ./... and you get two files: bpf_bpfeb.go (big-endian) and bpf_bpfel.go (little-endian). You include the architecture-specific one via build tags. Don’t edit these — they regenerate on every go generate.
The generated code gives you bpfObjects, bpfPrograms, and bpfMaps structs. Each map and program in your C is a typed field.
Loading and attaching
import (
"github.com/cilium/ebpf"
"github.com/cilium/ebpf/link"
"github.com/cilium/ebpf/rlimit"
)
// Required on most kernels — bumps RLIMIT_MEMLOCK
if err := rlimit.RemoveMemlock(); err != nil {
log.Fatal("removing memlock:", err)
}
var objs bpfObjects
if err := loadBpfObjects(&objs, nil); err != nil {
log.Fatal("loading objects:", err)
}
defer objs.Close()
iface, err := net.InterfaceByName("eth0")
if err != nil {
log.Fatal("interface lookup:", err)
}
xdpLink, err := link.AttachXDP(link.XDPOptions{
Program: objs.CountIpv4,
Interface: iface.Index,
Flags: link.XDPGenericMode,
})
if err != nil {
log.Fatal("attaching XDP:", err)
}
defer xdpLink.Close()
XDPGenericMode runs in generic/SKB mode — works on any interface but slow. XDPDriverMode requires native support in the NIC driver. XDPOffloadMode requires programmable hardware.
Reading maps from userspace
key := uint32(0)
var values []uint64
if err := objs.PktCount.Lookup(key, &values); err != nil {
log.Printf("map lookup: %v", err)
}
possibleCPUs, _ := ebpf.PossibleCPU()
if len(values) != possibleCPUs {
log.Printf("unexpected per-cpu length: got %d, expected %d", len(values), possibleCPUs)
}
var total uint64
for _, v := range values {
total += v
}
For per-CPU maps, the cilium/ebpf library handles the per-CPU array marshaling. You pass a slice of the value type; it returns one entry per CPU. Forgetting this and reading a scalar gives you a confusing error.
Iterating maps
var key uint32
var value uint64
iter := objs.SeenIps.Iterate()
for iter.Next(&key, &value) {
fmt.Printf("ip=%s seen=%d\n", netip.AddrFrom4(*(*[4]byte)(unsafe.Pointer(&key))), value)
}
if err := iter.Err(); err != nil {
log.Printf("iteration error: %v", err)
}
Iteration takes a snapshot conceptually — entries added during iteration may or may not appear, deletions may show as zero values. Don’t rely on iteration order.
Consuming a ringbuf
import "github.com/cilium/ebpf/ringbuf"
reader, err := ringbuf.NewReader(objs.NewFlowEvents)
if err != nil {
log.Fatal("ringbuf reader:", err)
}
defer reader.Close()
for {
record, err := reader.Read()
if err != nil {
if errors.Is(err, ringbuf.ErrClosed) {
return
}
log.Printf("ringbuf read: %v", err)
continue
}
if len(record.RawSample) != 4 {
log.Printf("unexpected event size: %d", len(record.RawSample))
continue
}
// record.RawSample is a copy, safe to use after Read returns
srcIP := netip.AddrFrom4([4]byte{
record.RawSample[0], record.RawSample[1],
record.RawSample[2], record.RawSample[3],
})
fmt.Println("new flow from", srcIP)
}
Read() blocks until an event arrives or the reader is closed. Close it from your shutdown handler — that’s how you escape the loop on signal.
Common gotchas
Struct layout mismatch. If your C struct is struct { __u8 a; __u32 b; } and your Go struct is struct { A uint8; B uint32 }, the Go struct is 8 bytes (3 bytes padding) and the C struct depends on platform. Use bpf2go-generated types or pin layout explicitly with _ structs.HostLayout (Go 1.22+) and __attribute__((packed)) on the C side. This is the single biggest source of “the values are wrong” bugs.
Forgetting Close(). Maps and programs hold kernel resources. If your process exits without calling objs.Close(), the kernel cleans up eventually but you’ll see EEXIST errors on next start while the old objects are still around. Always defer Close.
Map full silently. Update returns unix.E2BIG when the map is full. If you don’t check, updates silently fail. Always check returns from Update and Put.
XDP works in lab, fails in prod. Generic XDP runs anywhere, including in containers and VMs. Native XDP requires driver support and the right NIC. Test what you’ll deploy on; don’t assume.
Project layout
A typical layout that works:
project/
├── bpf/
│ └── xdp_counter.c
├── cmd/
│ └── agent-node/
│ └── main.go
├── internal/
│ ├── bpf/
│ │ ├── gen.go // go:generate directive
│ │ ├── bpf_bpfeb.go // generated
│ │ ├── bpf_bpfel.go // generated
│ │ └── loader.go // your wrapper
│ └── ...
├── go.mod
└── Dockerfile
The internal/bpf package owns loading, attaching, and exposing typed accessors. Your business logic in cmd/agent-node calls into that package. Keep the C close to the Go that consumes it.
cilium/ebpf has come a long way. The library is genuinely good now, the verifier errors are getting better, and bpf2go solves real problems. Most of the pain you’ll meet is C verifier complaints, not Go library issues.