2019/10/27 - Linux Kernel debugging with QEMU

EDIT:

2021-07-15: Added Appendix: Dockerfile for Kernel development and updated busybox + Kernel versions.
2023-11-23: Fix ramdisk vs ramfs (ref), and use devtmpfs and updated busybox + Kernel versions.

The other evening while starring at some Linux kernel code I thought, let me setup a minimal environment so I can easily step through the code and examine the state.

I ended up creating:

a Linux kernel with minimal configuration
a minimal initramfs to boot into which is based on busybox

In the remaing part of this article we will go through each step by first building the kernel, then building the initrd and then running the kernel using QEMU and debugging it with GDB.

$> make kernel

Before building the kernel we first need to generate a configuration. As a starting point we generate a minimal config with the make tinyconfig make target. Running this command will generate a .config file. After generating the initial config file we customize the kernel using the merge fragment flow. This allows us to merge a fragment file into the current configuration by running the scripts/kconfig/merge_config.sh script.

Let's quickly go over some customizations we do. The following two lines enable support for gzipped initramfs:

CONFIG_BLK_DEV_INITRD=y
CONFIG_RD_GZIP=y

The next two configurations are important as they enable the binary loaders for ELF and script #! files.

CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_SCRIPT=y

Note: In the cursed based configuration make menuconfig we can search for configurations using the / key and then select a match using the number keys. After selecting a match we can check the Help to get a description for the configuration parameter.

Building the kernel with the default make target will give us the following two files:

vmlinux statically linked kernel (ELF file) containing symbol information for debugging
arch/x86_64/boot/bzImage compressed kernel image for booting

Full configure & build script:

#!/bin/bash

set -e

LINUX=linux-6.6.2
wget https://cdn.kernel.org/pub/linux/kernel/v6.x/$LINUX.tar.xz
unxz $LINUX.tar.xz && tar xf $LINUX.tar

cd $LINUX

cat <<EOF > kernel_fragment.config
# 64bit kernel
CONFIG_64BIT=y
# enable support for compressed initrd (gzip)
CONFIG_BLK_DEV_INITRD=y
CONFIG_RD_GZIP=y
# support for ELF and #! binary format
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_SCRIPT=y
# /dev
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
# tty & console
CONFIG_TTY=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
# pseudo fs
CONFIG_PROC_FS=y
CONFIG_SYSFS=y
# debugging
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_INFO=y
## tinyconfig sets DEBUG_INFO_NONE, overwrite with toolchain default else
## DEBUG_INFO will not be enabled.
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_PRINTK=y
CONFIG_EARLY_PRINTK=y
EOF

make tinyconfig
./scripts/kconfig/merge_config.sh -n ./kernel_fragment.config
make -j$(nproc --ignore=2)

$> make initrd

Next step is to build the initrd which we base on busybox. Therefore we first build the busybox project in its default configuration with one change, we enable following configuration to build a static binary so it can be used stand-alone:

sed -i 's/# CONFIG_STATIC .*/CONFIG_STATIC=y/' .config

One important step before creating the final initrd is to create an init process. This will be the first process executed in userspace after the kernel finished its initialization. We just create a script that drops us into a shell:

cat <<EOF > init
#!/bin/sh

mount -t proc none /proc
mount -t sysfs none /sys
mount -t devtmpfs none /dev

exec setsid cttyhack sh
EOF

By default the kernel looks for /sbin/init in the root file system, but the location can optionally be specified with the init= kernel parameter.

Full busybox & initrd build script:

#!/bin/bash

if test $(id -u) -ne 0; then
    SUDO=sudo
fi

set -e

BUSYBOX=busybox-1.36.1
INITRD=$PWD/initramfs.cpio.gz

## Build busybox

echo "[+] configure & build $BUSYBOX ..."
[[ ! -d $BUSYBOX ]] && {
    wget https://busybox.net/downloads/$BUSYBOX.tar.bz2
    bunzip2 $BUSYBOX.tar.bz2 && tar xf $BUSYBOX.tar
}

cd $BUSYBOX
make defconfig
sed -i 's/# CONFIG_STATIC .*/CONFIG_STATIC=y/' .config
make -j$(nproc --ignore=2) busybox
make install

## Create initrd

echo "[+] create initrd $INITRD ..."

cd _install

# 1. create initrd folder structure
mkdir -p bin sbin etc proc sys usr/bin usr/sbin dev

# 2. create init process
cat <<EOF > init
#!/bin/sh

mount -t proc none /proc
mount -t sysfs none /sys
mount -t devtmpfs none /dev

exec setsid cttyhack sh
EOF
chmod +x init

# 3. created compressed initrd
find . -print0                      \
    | cpio --null -ov --format=newc \
    | gzip -9 > $INITRD

Running QEMU && GDB

After finishing the previous steps we have all we need to run and debug the kernel. We have arch/x86/boot/bzImage and initramfs.cpio.gz to boot the kernel into a shell and we have vmlinux to feed the debugger with debug symbols.

We start QEMU as follows, thanks to the -S flag the CPU will freeze until we connected the debugger:

# -S    freeze CPU until debugger connected
> qemu-system-x86_64                                                 \
  -kernel ./linux-5.3.7/arch/x86/boot/bzImage                        \
  -nographic                                                         \
  -append "earlyprintk=ttyS0 console=ttyS0 nokaslr init=/init debug" \
  -initrd ./initramfs.cpio.gz                                        \
  -gdb tcp::1234                                                     \
  -S

Then we can start GDB and connect to the GDB server running in QEMU (configured via -gdb tcp::1234). From now on we can start to debug through the kernel.

> gdb linux-5.3.7/vmlinux -ex 'target remote :1234'
(gdb) b do_execve
Breakpoint 1 at 0xffffffff810a1a60: file fs/exec.c, line 1885.
(gdb) c
Breakpoint 1, do_execve (filename=0xffff888000060000, __argv=0xffffffff8181e160 <argv_init>, __envp=0xffffffff8181e040 <envp_init>) at fs/exec.c:1885
1885          return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
(gdb) bt
#0  do_execve (filename=0xffff888000060000, __argv=0xffffffff8181e160 <argv_init>, __envp=0xffffffff8181e040 <envp_init>) at fs/exec.c:1885
#1  0xffffffff81000498 in run_init_process (init_filename=<optimized out>) at init/main.c:1048
#2  0xffffffff81116b75 in kernel_init (unused=<optimized out>) at init/main.c:1129
#3  0xffffffff8120014f in ret_from_fork () at arch/x86/entry/entry_64.S:352
#4  0x0000000000000000 in ?? ()
(gdb)

Appendix: Try to get around `<optimized out>`

When debugging the kernel we often face following situation in gdb:

(gdb) frame
#0  do_execveat_common (fd=fd@entry=-100, filename=0xffff888000120000, argv=argv@entry=..., envp=envp@entry=..., flags=flags@entry=0) at fs/exec.c

(gdb) info args
fd = <optimized out>
filename = 0xffff888000060000
argv = <optimized out>
envp = <optimized out>
flags = <optimized out>
file = 0x0

The problem is that the Linux kernel requires certain code to be compiled with optimizations enabled.

In this situation we can "try" to reduce the optimization for single compilation units or a subtree (try because, reducing the optimization could break the build). To do so we adapt the Makefile in the corresponding directory.

# fs/Makefile

# configure for single compilation unit
CFLAGS_exec.o := -Og

# configure for the whole subtree of where the Makefile resides
ccflags-y := -Og

After enabling optimize for debug experience -Og we can see the following now in gdb:

(gdb) frame
#0  do_execveat_common (fd=fd@entry=-100, filename=0xffff888000120000, argv=argv@entry=..., envp=envp@entry=..., flags=flags@entry=0) at fs/exec.c

(gdb) info args
fd = -100
filename = 0xffff888000120000
argv = {ptr = {native = 0x10c5980}}
envp = {ptr = {native = 0x10c5990}}
flags = 0

(gdb) p *filename
$3 = {name = 0xffff888000120020 "/bin/ls", uptr = 0x10c59b8 "/bin/ls", refcnt = 1, aname = 0x0, iname = 0xffff888000120020 "/bin/ls"}

(gdb) ptype filename
type = struct filename {
    const char *name;
    const char *uptr;
    int refcnt;
    struct audit_names *aname;
    const char iname[];
}

Appendix: `Dockerfile` for Kernel development

The following Dockerfile provides a development environment with all the required tools and dependencies, to re-produce all the steps of building and debugging the Linux kernel.

FROM ubuntu:20.04
MAINTAINER Johannes Stoelp <johannes.stoelp@gmail.com>

RUN apt update                     \
 && DEBIAN_FRONTEND=noninteractive \
    apt install                    \
      --yes                        \
      --no-install-recommends      \
      # Download & unpack.
      wget                         \
      ca-certificates              \
      xz-utils                     \
      # Build tools & deps (kernel).
      make                         \
      bc                           \
      gcc g++                      \
      flex bison                   \
      libncurses-dev               \
      libelf-dev                   \
      # Build tools & deps (initrd).
      cpio                         \
      # Run & debug.
      qemu-system-x86              \
      gdb                          \
      cgdb                         \
      telnet                       \
      # Convenience.
      ripgrep                      \
      fd-find                      \
      neovim                       \
 && rm -rf /var/lib/apt/lists/*    \
 && apt-get clean

WORKDIR /develop

Save the listing above in a file called Dockerfile and build the docker image as follows.

docker build -t kernel-dev

Optionally set DOCKER_BUILDKIT=1 to use the newer image builder.

Once the image has been built, an interactive container can be launched as follows.

# Some options for conveniene:
#   -v <HOST>:<GUEST>     Mount host path to guest path.
#   --rm                  Remove the container after exiting.

docker run -it kernel-dev

Alternatively use podman.

Appendix: Screencast of an example debug session

The screencast gives an example, debugging the Linux kernel using the above mentioned Dockerfile.