eBPF for Linux Admins - Part 1

Published On: August 22, 2023

This article covers the basics of eBPF

This article is for Linux Administrators who are trying to demystify eBPF.

Pre-requisites

This article is written with the assumption that the reader have a good understanding of Linux networking and familiar with packet tracing using tcpdump.

Some of the internals were intentionally excluded to simplify the topic.

The Berkley Packet Filter (BPF)

The tcpdump utility is a special tool in Linux, and first, we will discuss the basics of how tcpdump works.

Let’s take the scenario were you wanted to observe all ARP packet coming to the network interface of a Linux system. The packet fist lands in the network device hardware and then later will be placed in an RX (receive) queue in the Kernel.

For a user to examine the contents of a matching packet, that packet needs to be copied from kernel space to the user space .Then each of the packets needs to be filtered based on its type; here its ARP type.

Switching CPU from kernel space to user space to copy packet is inefficient and will affect the system performance.

So how can we filter packets which are on-the way within the kernel space and copy only the matching packets in user space?

Here comes the BPF or Berkley Packet Filter.

The BPF virtual machine is a pseudo VM inside the Linux kernel. For the sake of simplicity, you can consider this as a custom module loaded to the kernel.

For the ease of understanding the concept, you can think of BPF like JVM (Java Virtual Machine)

The BPF VM supports a limited set of instructions and there are many restrictions to the usage as well.

Below are the registers in BPF VM (or pseudo-machine)

  • An accumulator [A] where the contents of the packet get loaded.
  • An index register [X].
  • A scratch memory area.
  • An implicit Program Counter.

The filters we pass to tcpdump command will be converted into “byte code” and then injected directly into the kernel.(More about byte code will be coming later in this article.)

The load instructions loads the packet data to accumulator, and then we can examine the packets in BPF VM.

Let’s examine the code generated by the tcpdump command that filters the ARP packets coming to interface ens33.

[root@localhost ~]# tcpdump -i ens33 arp -d
(000) ldh      [12]
(001) jeq      #0x806           jt 2    jf 3
(002) ret      #262144
(003) ret      #0
[root@localhost ~]#

Explanation

(000) ldh - Load half word (16 bits) from index 12 
(001) jeq - If accumulator value is 0x806 ; ie ARP packet, then jump to 2 else jump to 3
(002) ret - Return the contents with buffer size 262144 ; ie entire packet or [max snapshot length](https://github.com/the-tcpdump-group/tcpdump/blob/tcpdump-4.9/netdissect.h#L263)
(003) ret - Discard the packet 

You can find more details of the inner working of BPF in this Usenix paper

So the above filter skips the source and destination mac fields and then loads 16bits from the index 12 which is the packet type.

So the 16bits - 0x806 (00000100 00000011) at offset 12 will try to match ARP packet!

Few points to note;

  • The Ethernet type II packet have below format;

    +--------------------+--------------------+-------------+----------------+-----+
    | 6 Byte Dest. Mac   | 6 Byte Source Mac  | 2 Byte Type | 46 - 1500 data | FCS |
    +--------------------+--------------------+-------------+----------------+-----+
    
  • Ethernet packets are big-endain.

  • In a 32bit system, a full word is 32bit, half word is 16bit.

  • 1 byte = 8bits, 2 byte = 16bits

  • You can find the Ethernet type hex representation of packet types in IANA

    ------------------------------------------------------------------------------------------------------------------------------------------------
    Ethertype (decimal) 	Ethertype (hex) 	Exp. Ethernet (decimal) 	Exp. Ethernet (octal) 	Description 	                    Reference 
    ------------------------------------------------------------------------------------------------------------------------------------------------
    2054                    0806                -                           -                       Address Resolution Protocol (ARP)   [RFC7042]
    ------------------------------------------------------------------------------------------------------------------------------------------------
    

The Byte Code

The BPF program we discussed above can be converted to byte code.

What is byte code?

A byte code will be executed by a Virtual Machine (VM).

In this case the VM is a BPF pseudo VM sitting inside the Kernel.

The user space can inject this bytecode to the BPF pseudo VM and the VM will convert that to the architecture dependant assembly code which can be executed directly on the hardware.

We can generate the bytecode of the BPF instruction in tcpdump itself.

[root@localhost ~]# tcpdump -i ens33 arp -ddd
4
40 0 0 12
21 0 1 2054
6 0 0 262144
6 0 0 0

The bytecode can be injected into the system in different ways. The tcmpdump utility have it’s own logic to do this operation.

With that we concludes the Part - 1 of eBPF for Linux Admins here.

In the next part, we will discuss eBPF or extended BPF.


About the Author

Ansil

Ansil Hameed Kunju

Ansil has more than a decade of experience in different IT domains. He is an expert in DevOps.His skill set includes Linux, GCP, AWS, VMware, Nutanix, Rancher, Docker, Git, Python, Golang, Kubernetes, Istio, Prometheus, Grafana, ArgoCD, Jenkins, StackStorm and other CNCF projects.