[root@memzero]# ls

2021/12/02 - QEMU virtio configurations

For my own reference I wanted to document some minimal virtio device configurations with qemu and the required Linux kernel configuration to enable those devices.

The devices we will use are virtio console, virtio blk and virtio net.

To make use of the virtio devices in qemu we are going to build and boot into busybox based initramfs.

Build initramfs

For the initramfs there is not much magic, we will grab a copy of busybox, configure it with the default config (defconfig) and enable static linking as we will use it as rootfs.

For the init process we will use the one provided by busybox but we have to symlink it to /init as during boot, the kernel will extract the cpio compressed initramfs into rootfs and look for the /init file. If that's not found the kernel will fallback to an older mechanism an try to mount a root partition (which we don't have).

Optionally the init binary could be specified with the rdinit= kernel boot parameter.

We populate the /etc/inittab and /etc/init.d/rcS with a minimal configuration to mount the proc, sys and dev filesystems and drop into a shell after the boot is completed.
Additionally we setup /etc/passwd and /etc/shadow with an entry for the root user with the password 1234, so we can login via the virtio console later.

ln -sfn sbin/init init

cat <<EOF > etc/inittab
# Initialization after boot.
::sysinit:/etc/init.d/rcS

# Shell on console after user presses key.
::askfirst:/bin/cttyhack /bin/sh -l

# Spawn getty on first virtio console.
::respawn:/sbin/getty hvc0 9600 vt100
EOF

cat <<EOF > etc/init.d/rcS
#!/bin/sh

# Mount devtmpfs, which automatically populates /dev with devices nodes.
# So no mknod for our experiments :}
mount -t devtmpfs none /dev

# Mount procfs and sysfs.
mount -t proc none /proc
mount -t sysfs none /sys

# Set hostname.
hostname virtio-box
EOF
chmod +x etc/init.d/rcS

cat <<EOF > etc/profile
export PS1="\[\e[31m\e[1m\]\u@\h\[\e[0m\] \w # "
EOF

# 3. Create minimal passwd db with 'root' user and password '1234'.
#    Mainly used for login on virtual console in this experiments.
echo "root:x:0:0:root:/:/bin/sh" > etc/passwd
echo "root:$(openssl passwd -crypt 1234):0::::::" > etc/shadow

The full build script is available under build_initramfs.sh.

Virtio console

To enable support for the virtio console we enable the kernel configs shown below. The pci configurations are enabled because in qemu the virtio console front-end device (the one presented to the guest) is attached to the pci bus.

# Enable support for virtio pci.
CONFIG_PCI=y
CONFIG_VIRTIO_MENU=y
CONFIG_VIRTIO_PCI=y

# Enable virtio console driver.
CONFIG_VIRTIO_CONSOLE=y

The full build script is available under build_kernel.sh.

To boot-up the guest we use the following qemu configuration.

qemu-system-x86_64                                            \
  -nographic                                                  \
  -cpu host                                                   \
  -enable-kvm                                                 \
  -kernel ./linux-$(VER)/arch/x86/boot/bzImage                \
  -append "earlyprintk=ttyS0 console=ttyS0 root=/dev/ram0 ro" \
  -initrd ./initramfs.cpio.gz                                 \
  -device virtio-serial-pci                                   \
  -device virtconsole,chardev=vcon,name=console.0             \
  -chardev socket,id=vcon,ipv4=on,host=localhost,port=2222,server,telnet=on,wait=off

The important parts in this configuration are the last three lines.

The virtio-serial-pci device creates the serial bus where the virtio console is attached to.

The virtconsole creates the virtio console device exposed to the guest (front-end). The chardev=vcon option specifies that the chardev with id=vcon is attached as back-end to the virtio console. The back-end device is the one we will have access to from the host running the emulation.

The chardev back-end we configure to be a socket, running a telnet server listening on port 2222. The wait=off tells qemu that it can directly boot without waiting for a client connection.

After booting the guest we are dropped into a shell and can verify that our device is being detected properly.

root@virtio-box ~ # ls /sys/bus/virtio/devices/
virtio0
root@virtio-box ~ # cat /sys/bus/virtio/devices/virtio0/virtio-ports/vport0p0/name
console.0

In /etc/inittab, we already configured to spawn getty on the first hypervisor console /dev/hvc0. This will effectively run login(1) over the serial console.

From the host we can run telnet localhost 2222 and are presented with a login shell to the guest.

As we already included to launch getty on the first hypervisor console /dev/hvc0 in /etc/inittab, we can directly connect to the back-end chardev and login to the guest with root:1234.

> telnet -4 localhost 2222
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

virtio-box login: root
Password:
root@virtio-box ~ #

Virtio blk

To enable support for the virtio block device we enable the kernel configs shown below. First we enable general support for block devices and then for virtio block devices. Additionally we enable support for the ext2 filesystem because we are creating an ext2 filesystem to back the virtio block device.

# Enable support for block devices.
CONFIG_BLK_DEV=y

# Enable virtio blk driver.
CONFIG_VIRTIO_BLK=y

# Enable support for ext2 filesystems.
CONFIG_EXT2_FS=y

The full build script is available under build_kernel.sh.

Next we are creating the ext2 filesystem image. This we'll do by creating an 128M blob and format it with ext2 afterwards. Then we can mount the image via a loop device and populate the filesystem.

dd if=/dev/zero of=blk.ext2 bs=1M count=128
mkfs.ext2 blk.ext2
sudo mount -t ext2 -o loop blk.ext2 /mnt
echo world | sudo tee /mnt/hello
sudo umount /mnt

Before booting the guest we will attach the virtio block device to the VM. Therefore we add the -drive configuration to our previous qemu invocation.

qemu-system-x86_64 \
  ...
  -drive if=virtio,file=fs.ext2,format=raw

The -drive option is a shortcut for a -device (front-end) / -blockdev (back-end) pair.

The if=virtio flag specifies the interface of the front-end device to be virtio.

The file and format flags configure the back-end to be a disk image.

After booting the guest we are dropped into a shell and can verify a few things. First we check if the virtio block device is detected, then we check if we have support for the ext2 filesystem and finally we mount the disk.

root@virtio-box ~ # ls -l /sys/block/
lrwxrwxrwx 1 root 0 0 Dec  3 22:46 vda -> ../devices/pci0000:00/0000:00:05.0/virtio1/block/vda

root@virtio-box ~ # cat /proc/filesystems
...
       ext2

root@virtio-box ~ # mount -t ext2 /dev/vda /mnt
EXT2-fs (vda): warning: mounting unchecked fs, running e2fsck is recommended
ext2 filesystem being mounted at /mnt supports timestamps until 2038 (0x7fffffff)

root@virtio-box ~ # cat /mnt/hello
world

Virtio net

To enable support for the virtio network device we enable the kernel configs shown below. First we enable general support for networking and TCP/IP and then enable the core networking driver and the virtio net driver.

# Enable general networking support.
CONFIG_NET=y

# Enable support for TCP/IP.
CONFIG_INET=y

# Enable support for network devices.
CONFIG_NETDEVICES=y

# Enable networking core drivers.
CONFIG_NET_CORE=y

# Enable virtio net driver.
CONFIG_VIRTIO_NET=y

The full build script is available under build_kernel.sh.

For the qemu device emulation we already decided on the front-end device, which will be our virtio net device.
On the back-end we will choose the user option. This enables a network stack implemented in userspace based on libslirp, which has the benefit that we do not need to setup additional network interfaces and therefore require any privileges. Fundamentally, libslirp works by replaying Layer 2 packets received from the guest NIC via the socket API on the host (Layer 4) and vice versa. User networking comes with a set of limitations, for example

With the guest, qemu and the host in the picture this looks something like the following.

+--------------------------------------------+
|                                       host |
|     +-------------------------+            |
|     | guest                   |            |
|     |                         |            |
|     |                    user |            |
|     +------+------+-----------+            |
|     |      | eth0 |    kernel |            |
|     |      +--+---+           |            |
|     |         |               |            |
|     |   +-----v--------+      |            |
|     |   | nic (virtio) |      |            |
|  +--+---+-----+--------+------+--+         |
|  |            | Layer 2     qemu |         |
|  |            | (eth frames)     |         |
|  |       +----v-----+            |         |
|  |       | libslirp |            |         |
|  |       +----+-----+            |         |
|  |            | Layer 4          |         |
|  |            | (socket API)     |    user |
+--+---------+--v---+--------------+---------+
|            | eth0 |                 kernel |
|            +------+                        |
+--------------------------------------------+

The user networking implements a virtually NAT'ed sub-network with the address range 10.0.2.0/24 running an internal dhcp server. By default, the dhcp server assigns the following IP addresses which are interesting to us:

The netdev options net=addr/mask, host=addr, dns=addr can be used to re-configure the sub-network (see network options).

With the details of the sub-network in mind we can add some additional setup to the initramfs which performs the basic network setup.

We add the virtual DNS server to /etc/resolv.conf which will be used by the libc resolver functions.

Additionally we assign a static ip to the eth0 network interface, bring the interface up and define the default route via the host 10.0.2.2.

# 4. Create minimal setup for basic networking.

# Virtul DNS from qemu user network.
echo "nameserver 10.0.2.3" > etc/resolv.conf

# Assign static IP address, bring-up interface and define default route.
cat <<EOF >> etc/init.d/rcS
# Assign static IP address to eth0 interface.
ip addr add 10.0.2.15/24 dev eth0

# Bring up eth0 interface.
ip link set dev eth0 up

# Add default route via the host (qemu user networking exposes host at this
# address by default).
ip route add default via 10.0.2.2
EOF

The full build script is available under build_initramfs.sh.

Before booting the guest we will attach the virtio net device and configure to use the user network stack. Therefore we add the -nic configuration to our previous qemu invocation.

qemu-system-x86_64 \
  ...
  -nic user,model=virtio-net-pci

The -nic option is a shortcut for a -device (front-end) / -netdev (back-end) pair.

After booting the guest we are dropped into a shell and can verify a few things. First we check if the virtio net device is detected. Then we check if the interface got configured and brought up correctly.

root@virtio-box ~ # ls -l /sys/class/net/
lrwxrwxrwx 1 root 0 0 Dec  4 16:56 eth0 -> ../../devices/pci0000:00/0000:00:03.0/virtio0/net/eth0
lrwxrwxrwx 1 root 0 0 Dec  4 16:56 lo -> ../../devices/virtual/net/lo


root@virtio-box ~ # ip -o a
2: eth0    inet 10.0.2.15/24 scope global eth0  ...

root@virtio-box ~ # ip route
default via 10.0.2.2 dev eth0
10.0.2.0/24 dev eth0 scope link  src 10.0.2.15

We can resolve out domain and see that the virtual DNS gets contacted.

root@virtio-box ~ # nslookup memzero.de
Server:   10.0.2.3
Address:  10.0.2.3:53

Non-authoritative answer:
Name:    memzero.de
Address: 46.101.148.203

Additionally we can try to access a service running on the host. Therefore we run a simple http server on the host (where we launched qemu) with the following command python3 -m http.server --bind 0.0.0.0 1234. This will launch the server to listen for any incoming address at port 1234.

From within the guest we can manually craft a simple http GET request and send it to the http server running on the host. For that we use the IP address 10.0.2.2 which the dhcp assigned to our host.

root@virtio-box ~ # echo "GET / HTTP/1.0" | nc 10.0.2.2 1234
HTTP/1.0 200 OK
Server: SimpleHTTP/0.6 Python/3.9.7
Date: Sat, 04 Dec 2021 16:58:56 GMT
Content-type: text/html; charset=utf-8
Content-Length: 917

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Directory listing for /</title>
</head>
<body>
<h1>Directory listing for /</h1>
<hr>
<ul>
<li><a href="build_initramfs.sh">build_initramfs.sh</a></li>
...
</ul>
<hr>
</body>
</html>

Appendix: Workspace

To re-produce the setup and play around with it just grab a copy of the following files:

Then run the following steps to build everything. The prefix [H] and [C] indicate whether this command is run on the host or inside the container respectively.

# To see all the make targets.
[H] make help

# Build docker image, start a container with the current working dir
# mounted. On the first invocation this takes some minutes to build
# the image.
[H]: make docker

# Build kernel and initramfs.
[C]: make

# Build ext2 fs as virtio blkdev backend.
[H]: make ext2

# Start qemu guest.
[H]: make run