For my own reference I wanted to document some minimal virtio
device configurations with qemu and the required Linux kernel configuration to
enable those devices.
The devices we will use are virtio console
, virtio blk
and virtio net
.
To make use of the virtio devices in qemu we are going to build and boot into
busybox based initramfs
.
Build initramfs
For the initramfs there is not much magic, we will grab a copy of busybox,
configure it with the default config (defconfig
) and enable static linking as
we will use it as rootfs.
For the init
process we will use the one provided by busybox but we have to
symlink it to /init
as during boot, the kernel will extract the cpio
compressed initramfs into rootfs
and look for the /init
file. If that's not
found the kernel will fallback to an older mechanism an try to mount a root
partition (which we don't have).
Optionally the init binary could be specified with the
rdinit=
kernel boot parameter.
We populate the /etc/inittab
and /etc/init.d/rcS
with a minimal
configuration to mount the proc
, sys
and dev
filesystems and drop into a
shell after the boot is completed.
Additionally we setup /etc/passwd
and /etc/shadow
with an entry for the
root
user with the password 1234
, so we can login via the virtio console
later.
ln -sfn sbin/init init
cat <<EOF > etc/inittab
# Initialization after boot.
::sysinit:/etc/init.d/rcS
# Shell on console after user presses key.
::askfirst:/bin/cttyhack /bin/sh -l
# Spawn getty on first virtio console.
::respawn:/sbin/getty hvc0 9600 vt100
EOF
cat <<EOF > etc/init.d/rcS
#!/bin/sh
# Mount devtmpfs, which automatically populates /dev with devices nodes.
# So no mknod for our experiments :}
mount -t devtmpfs none /dev
# Mount procfs and sysfs.
mount -t proc none /proc
mount -t sysfs none /sys
# Set hostname.
hostname virtio-box
EOF
chmod +x etc/init.d/rcS
cat <<EOF > etc/profile
export PS1="\[\e[31m\e[1m\]\u@\h\[\e[0m\] \w # "
EOF
# 3. Create minimal passwd db with 'root' user and password '1234'.
# Mainly used for login on virtual console in this experiments.
echo "root:x:0:0:root:/:/bin/sh" > etc/passwd
echo "root:$(openssl passwd -crypt 1234):0::::::" > etc/shadow
The full build script is available under build_initramfs.sh.
Virtio console
To enable support for the virtio console we enable the kernel configs shown below. The pci configurations are enabled because in qemu the virtio console front-end device (the one presented to the guest) is attached to the pci bus.
# Enable support for virtio pci.
CONFIG_PCI=y
CONFIG_VIRTIO_MENU=y
CONFIG_VIRTIO_PCI=y
# Enable virtio console driver.
CONFIG_VIRTIO_CONSOLE=y
The full build script is available under build_kernel.sh.
To boot-up the guest we use the following qemu configuration.
qemu-system-x86_64 \
-nographic \
-cpu host \
-enable-kvm \
-kernel ./linux-$(VER)/arch/x86/boot/bzImage \
-append "earlyprintk=ttyS0 console=ttyS0 root=/dev/ram0 ro" \
-initrd ./initramfs.cpio.gz \
-device virtio-serial-pci \
-device virtconsole,chardev=vcon,name=console.0 \
-chardev socket,id=vcon,ipv4=on,host=localhost,port=2222,server,telnet=on,wait=off
The important parts in this configuration are the last three lines.
The virtio-serial-pci
device creates the serial bus where the virtio console
is attached to.
The virtconsole
creates the virtio console device exposed to the guest
(front-end). The chardev=vcon
option specifies that the chardev with
id=vcon
is attached as back-end to the virtio console.
The back-end device is the one we will have access to from the host running the
emulation.
The chardev back-end we configure to be a socket
, running a telnet server
listening on port 2222. The wait=off
tells qemu that it can directly boot
without waiting for a client connection.
After booting the guest we are dropped into a shell and can verify that our device is being detected properly.
root@virtio-box ~ # ls /sys/bus/virtio/devices/
virtio0
root@virtio-box ~ # cat /sys/bus/virtio/devices/virtio0/virtio-ports/vport0p0/name
console.0
In /etc/inittab
, we already configured to spawn getty
on the first
hypervisor console /dev/hvc0
. This will effectively run login(1)
over the
serial console.
From the host we can run telnet localhost 2222
and are presented with a login shell to the guest.
As we already included to launch getty
on the first hypervisor console
/dev/hvc0
in /etc/inittab
, we can directly connect to the back-end chardev
and login to the guest with root:1234
.
> telnet -4 localhost 2222
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
virtio-box login: root
Password:
root@virtio-box ~ #
Virtio blk
To enable support for the virtio block device we enable the kernel configs
shown below.
First we enable general support for block devices and then for virtio block
devices. Additionally we enable support for the ext2
filesystem because we
are creating an ext2 filesystem to back the virtio block device.
# Enable support for block devices.
CONFIG_BLK_DEV=y
# Enable virtio blk driver.
CONFIG_VIRTIO_BLK=y
# Enable support for ext2 filesystems.
CONFIG_EXT2_FS=y
The full build script is available under build_kernel.sh.
Next we are creating the ext2 filesystem image. This we'll do by creating an
128M
blob and format it with ext2 afterwards. Then we can mount the image
via a loop
device and populate the filesystem.
dd if=/dev/zero of=blk.ext2 bs=1M count=128
mkfs.ext2 blk.ext2
sudo mount -t ext2 -o loop blk.ext2 /mnt
echo world | sudo tee /mnt/hello
sudo umount /mnt
Before booting the guest we will attach the virtio block device to the VM.
Therefore we add the -drive
configuration to our previous qemu invocation.
qemu-system-x86_64 \
...
-drive if=virtio,file=fs.ext2,format=raw
The -drive
option is a shortcut for a -device (front-end) / -blockdev (back-end)
pair.
The if=virtio
flag specifies the interface of the front-end device to be
virtio
.
The file
and format
flags configure the back-end to be a disk image.
After booting the guest we are dropped into a shell and can verify a few things. First we check if the virtio block device is detected, then we check if we have support for the ext2 filesystem and finally we mount the disk.
root@virtio-box ~ # ls -l /sys/block/
lrwxrwxrwx 1 root 0 0 Dec 3 22:46 vda -> ../devices/pci0000:00/0000:00:05.0/virtio1/block/vda
root@virtio-box ~ # cat /proc/filesystems
...
ext2
root@virtio-box ~ # mount -t ext2 /dev/vda /mnt
EXT2-fs (vda): warning: mounting unchecked fs, running e2fsck is recommended
ext2 filesystem being mounted at /mnt supports timestamps until 2038 (0x7fffffff)
root@virtio-box ~ # cat /mnt/hello
world
Virtio net
To enable support for the virtio network device we enable the kernel configs shown below. First we enable general support for networking and TCP/IP and then enable the core networking driver and the virtio net driver.
# Enable general networking support.
CONFIG_NET=y
# Enable support for TCP/IP.
CONFIG_INET=y
# Enable support for network devices.
CONFIG_NETDEVICES=y
# Enable networking core drivers.
CONFIG_NET_CORE=y
# Enable virtio net driver.
CONFIG_VIRTIO_NET=y
The full build script is available under build_kernel.sh.
For the qemu device emulation we already decided on the front-end device, which
will be our virtio net device.
On the back-end we will choose the user
option. This enables
a network stack implemented in userspace based on libslirp, which
has the benefit that we do not need to setup additional network interfaces and
therefore require any privileges. Fundamentally, libslirp works by
replaying Layer 2 packets received from the guest NIC via the socket
API on the host (Layer 4) and vice versa. User networking comes with a
set of limitations, for example
- Can not use
ping
inside the guest asICMP
is not supported. - The guest is not accessible from the host.
With the guest, qemu and the host in the picture this looks something like the following.
+--------------------------------------------+
| host |
| +-------------------------+ |
| | guest | |
| | | |
| | user | |
| +------+------+-----------+ |
| | | eth0 | kernel | |
| | +--+---+ | |
| | | | |
| | +-----v--------+ | |
| | | nic (virtio) | | |
| +--+---+-----+--------+------+--+ |
| | | Layer 2 qemu | |
| | | (eth frames) | |
| | +----v-----+ | |
| | | libslirp | | |
| | +----+-----+ | |
| | | Layer 4 | |
| | | (socket API) | user |
+--+---------+--v---+--------------+---------+
| | eth0 | kernel |
| +------+ |
+--------------------------------------------+
The user networking implements a virtually NAT'ed sub-network with the address
range 10.0.2.0/24
running an internal dhcp server. By default, the dhcp
server assigns the following IP addresses which are interesting to us:
10.0.2.2
host running the qemu emulation10.0.2.3
virtual DNS server
The netdev options
net=addr/mask
,host=addr
,dns=addr
can be used to re-configure the sub-network (see network options).
With the details of the sub-network in mind we can add some additional setup to the initramfs which performs the basic network setup.
We add the virtual DNS server to /etc/resolv.conf
which will be used by the
libc resolver functions.
Additionally we assign a static ip to the eth0
network interface, bring the
interface up and define the default route via the host 10.0.2.2
.
# 4. Create minimal setup for basic networking.
# Virtul DNS from qemu user network.
echo "nameserver 10.0.2.3" > etc/resolv.conf
# Assign static IP address, bring-up interface and define default route.
cat <<EOF >> etc/init.d/rcS
# Assign static IP address to eth0 interface.
ip addr add 10.0.2.15/24 dev eth0
# Bring up eth0 interface.
ip link set dev eth0 up
# Add default route via the host (qemu user networking exposes host at this
# address by default).
ip route add default via 10.0.2.2
EOF
The full build script is available under build_initramfs.sh.
Before booting the guest we will attach the virtio net device and configure to
use the user network stack.
Therefore we add the -nic
configuration to our previous qemu invocation.
qemu-system-x86_64 \
...
-nic user,model=virtio-net-pci
The -nic
option is a shortcut for a -device (front-end) / -netdev (back-end)
pair.
After booting the guest we are dropped into a shell and can verify a few things. First we check if the virtio net device is detected. Then we check if the interface got configured and brought up correctly.
root@virtio-box ~ # ls -l /sys/class/net/
lrwxrwxrwx 1 root 0 0 Dec 4 16:56 eth0 -> ../../devices/pci0000:00/0000:00:03.0/virtio0/net/eth0
lrwxrwxrwx 1 root 0 0 Dec 4 16:56 lo -> ../../devices/virtual/net/lo
root@virtio-box ~ # ip -o a
2: eth0 inet 10.0.2.15/24 scope global eth0 ...
root@virtio-box ~ # ip route
default via 10.0.2.2 dev eth0
10.0.2.0/24 dev eth0 scope link src 10.0.2.15
We can resolve out domain and see that the virtual DNS gets contacted.
root@virtio-box ~ # nslookup memzero.de
Server: 10.0.2.3
Address: 10.0.2.3:53
Non-authoritative answer:
Name: memzero.de
Address: 46.101.148.203
Additionally we can try to access a service running on the host. Therefore we
run a simple http server on the host (where we launched qemu) with the
following command python3 -m http.server --bind 0.0.0.0 1234
. This will
launch the server to listen for any incoming address at port 1234
.
From within the guest we can manually craft a simple http GET
request and
send it to the http server running on the host. For that we use the IP address
10.0.2.2
which the dhcp assigned to our host.
root@virtio-box ~ # echo "GET / HTTP/1.0" | nc 10.0.2.2 1234
HTTP/1.0 200 OK
Server: SimpleHTTP/0.6 Python/3.9.7
Date: Sat, 04 Dec 2021 16:58:56 GMT
Content-type: text/html; charset=utf-8
Content-Length: 917
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Directory listing for /</title>
</head>
<body>
<h1>Directory listing for /</h1>
<hr>
<ul>
<li><a href="build_initramfs.sh">build_initramfs.sh</a></li>
...
</ul>
<hr>
</body>
</html>
Appendix: Workspace
To re-produce the setup and play around with it just grab a copy of the following files:
Then run the following steps to build everything. The prefix [H]
and [C]
indicate whether this command is run on the host or inside the container
respectively.
# To see all the make targets.
[H] make help
# Build docker image, start a container with the current working dir
# mounted. On the first invocation this takes some minutes to build
# the image.
[H]: make docker
# Build kernel and initramfs.
[C]: make
# Build ext2 fs as virtio blkdev backend.
[H]: make ext2
# Start qemu guest.
[H]: make run