[root@memzero]# ls

2024/12/20 - xpost: x64 jit assembler (juicebox-asm)

This is a cross post to a repository implementing an x64 jit assembler, which I used to study the x86 instruction encoding and how to write such a tool in rust. The code is available on github >> juicebox-asm <<.

The assembler only implements a small set of the x86 instruction set, but enough to create some interesting examples. Additionally, the repository provides a simple runtime based on mmap(2) (linux only), which is used to execute the dynamically assembled code and the provided examples.

Assembler

The assembler implements all the supported instructions and provides a similar interface as when writing assembly code by hand.

let mut asm = Asm::new();
asm.xor(rax, rax);
asm.add(rax, rdi);
asm.add(rax, Mem64::indirect_base_index(rdi, rsi));
asm.ret();

asm.disasm();
// Outputs:
//   00000000  4831C0    xor rax,rax
//   00000003  4801F8    add rax,rdi
//   00000006  48030437  add rax,[rdi+rsi]
//   00000006  C3        ret

An interesting design-choice is given by the implementation of the assembler API for the various instructions, since x86 overloads instruction mnemonics with many operand types. This can be seen in the code listing above where the add instruction is used with both register-register and register-memory operands. It becomes even clearer when looking into the instruction reference manual, in the example of the add instruction.

Table taken from Intel® 64 and IA-32 Architectures Software Developer’s Manual - Volume 2 Instruction Set Reference.

One approach is to implement the functions with explicit types for the operands. This has the benefit that the API clearly expresses which operands are allowed for a given instruction and the compiler can check if wrong operands are provided. However, since rust does not have function overloading 1 as in cpp, one does need to define functions with different names to implement the instructions. The following shows an example for the add instruction with different operands.

// Pseudo code.
fn add_r64_r64(Reg64, Reg64)
fn add_r64_m64(Reg64, Mem64)
// ..

Another approach is to provided a flexible Operand type and implement the instructions with that. The benefit here is that there is only one add function. The downside however, is that the API does not clearly express which operands are allowed and the operand check must be done at runtime rather than compile-time.

// Pseudo code.
fn add(Operand, Operand)

During this experiment I have learned that there is a way in rust to emulate sort of function overloading, which allows to get all the benefits of the above discussed approaches while getting rid of all the downsides at the same time at the expense of a little boiler-plate code.

The idea is to define generic traits for the various instructions with overloaded mnemonics. The listing below shows the Add trait as example for the add instruction.

pub trait Add<T, U> {
    fn add(&mut self, op1: T, op2: U);
}

This in turn allows to implement the traits for all the different combinations of (explicit) operands that should be supported by each instruction. It gives fine control of what to support and provides an ergonomic API as seen in the first example code listing.

impl Add<Reg64, Reg64> for Asm {
    fn add(&mut self, op1: Reg64, op2: Reg64) {
        self.encode_rr(&[0x01], op1, op2);
    }
}

impl Add<Reg64, Mem64> for Asm {
    fn add(&mut self, op1: Reg64, op2: Mem64) {
        self.encode_rm(0x03, op1, op2);
    }
}

All instructions currently implemented with their variations can be found in the documentation of the Asm assembler.

Runtime

With the help of the provided runtime, code assembled with the assembler can be put into executable memory and turned into a function pointer, allowing to call into the generated code.

extern "C" fn add(a: u32, b: u32) -> u32 {
    a + b
}

fn main() {
    // SystemV abi:
    //   rdi -> first argument
    //   rsi -> second argument
    //   rax -> return value

    let mut asm = Asm::new();
    asm.mov(rsi, Imm64::from(42));
    asm.mov(rax, Imm64::from(add as usize));
    asm.call(rax);
    asm.ret();

    let mut rt = Runtime::new();
    let add42 = unsafe { rt.add_code::<extern "C" fn(u32) -> u32>(asm.into_code()) };

    let res = add42(5);
    assert_eq!(res, 47);
}

This example must be run on a system with the SystemV ABI.

Label

A label is used to associate a location in the generated code with a symbolic name (the label), which can then be used as branch target for control-flow instructions as for example jz, jnz.

The act of associating a label with a location is called binding. Each label can only be bound once, else the branch target would become ambiguous.

Binding and jumping to a label can come in two orders.

  1. A label is first bound and then jumped to (backwards-reference).
  2. A label is first jumped to and then bound (forwards-reference).

In the following both cases will be discussed in a bit more detail, which hopefully helps when browsing the implementation. One thing to keep in mind, since the generated code must be relocatable, all jump instructions will be encoded relative to the current program counter (pc).

The first case where the label is first bound and later used, is somewhat more natural to think about, as first defining then using something is pretty clear. The code listing below shows an example of a loop with a back-reference summing up all integer between 0 and 42.

1let mut asm = Asm::new();
2let mut lp = Label::new();
3
4asm.mov(eax, Imm32::from(0)); // int ret = 0;
5asm.mov(ecx, Imm32::from(42)); // int n = 42;
6
7asm.bind(&mut lp); // lp:
8asm.add(eax, ecx); // ret += n;
9asm.dec(ecx); // --n;
10asm.test(ecx, ecx); // if (n != 0)
11asm.jnz(&mut lp); // goto lp
12
13asm.ret();

From the point of view of the code buffer this looks as follows. When the label is bound (line 7), the current offset into the code buffer is recorded as the labels location. Then when emitting the jump instruction (line 11) later, this location is used with the current code buffers offset to compute the displacement (disp32) forming the relative jump.

The second case where the label is first used and then bound later, is not really more complex, but one has to postpone the computation of the displacement until the label is bound. The code listing below gives an example, demonstrating forwards jumps by validating two input arguments to be not equal to 0 and returning if one check fails.

1let mut asm = Asm::new();
2let mut ret = Label::new();
3
4// Validate inputs (sysv abi).
5asm.test(rdi, rdi); // if (rdi == 0)
6asm.jz(&mut ret); // return;
7asm.test(rsi, rsi); // if (rsi == 0)
8asm.jz(&mut ret); // return;
9
10// Here, rdi != 0 && rsi != 0.
11// nop as placeholder for some real work.
12asm.nop();
13
14asm.bind(&mut ret);
15asm.ret();

Again from the point of view of the code buffer this looks as follows. When the jump instructions are emitted (line 6, 8), the displacement can not be computed, as the label has no defined location yet. A placeholder displacement of value 0 is emitted into the code buffer and the offset to the displacement value is recorded by the label as pending relocation. Once the label is bound to a location, all pending relocations are resolved by consuming the recorded offsets, computing the corresponding displacements and patching the values in the code buffer.

Example: bf

The bf.rs example implements a jit compiler for the brainfuck language. In this example the bf program is jit compiled to one code blob as a whole and evaluated then after. The example only returns from the jit code if the bf program terminates or a runtime error is detected. For reference an interpreter is also provided.

Example: tiny-vm

The tiny_vm.rs example defines a tiny virtual machine (TinyVm) with a set of registers and custom instructions. It then goes on and implements a jit compiler for the TinyVm. In comparison to the jit compiler in the previous bf example, the TinyVm compiler jit compiles the program per basic-block (bb), each time a new bb is encountered.
After executing a jit compiled basic block, the code returns into the TinyVm. The vm then checks if the branch target has an entry in the jit cache, which maps guest addresses to compiled basic-blocks. If the cache returns a hit the TinyVm jumps to the generated code, if the cache returns a miss the new bb is compiled and inserted into the jit cache.

Similar to the bf example, this example also provides an interpreter as reference.

Appendix: x86 instruction encoding

x86 instructions are encoded with variable length and follow the format below. A single instruction can be as small as 1 byte up to a double-digit number of bytes.

Figure taken from Intel® 64 and IA-32 Architectures Software Developer’s Manual - Volume 2 Instruction Set Reference.

Prefixes are typically modifiers for the instruction, where some examples are LOCK for bus locking, REP for instruction repetition and REX for specifying extended operand sizes and to access extended registers (like r8-r15).

The ModR/M and SIB bytes are used to encode the register and memory operands of the instruction in the general case. Whether the SIB byte is used depends on the addressing mode.

In case the instruction also requires an address displacement or an immediate value, the bytes of the value directly follow the instruction bytes.

Historically, the x86 ISA only had 8 general-purpose registers (ax, cx, dx, bx, sp, bp, si, di). This is still visible today in various fields which encode register operands, where one such example is the ModRM.Reg field. Later on, 8 additional general-purpose registers (r8-r15) were added and one additional bit was needed to encode all 16 registers, while keeping backwards compatibility.
As mentioned above, the REX prefix is used to access the extended registers. This prefix contains the R, X, B bits, which are combined with the various register operand fields, as shown by the figures below for the case of the ModRM byte and the ModRM + SIB bytes.

Figures taken from Intel® 64 and IA-32 Architectures Software Developer’s Manual - Volume 2 Instruction Set Reference.

The following code listing provides an example for two load instructions with different addressing modes and gives a detailed breakdown of the instruction bytes for each instruction.

let mut asm = Asm::new();
asm.mov(r8, Mem64::indirect_disp(r9, 0xaabb));
asm.mov(r8, Mem64::indirect_base_index(r9, r10));

asm.disasm();
// Outputs:
//   4D8B81BBAA0000    mov r8,[r9+0xaabb]
//   4F8B0411          mov r8,[r9+r10]


// Breakdown of the indirect + displacement load.
//
//   4D 8B 81 BBAA0000    mov r8,[r9+0xaabb]
//   ^  ^  ^  `Imm32
//   |  |  `ModRM: mod=10,reg=000,rm=001
//   |  `Opc: mov
//   `REX: W=1,R=1,X=0,B=1
//
//   REX.R + ModRM.reg = 1000 (r8) - dest reg
//   REX.B + ModRm.rm  = 1001 (r9) - base reg


// Breakdown of the base + index load.
//
//   4F 8B 04 11          mov r8,[r9+r10]
//   ^  ^  ^  `SIB: scale=00,index=010,base=001
//   |  |  `ModRM: mod=00,reg=000,rm=100
//   |  `Opc: mov
//   `REX: W=1,R=1,X=1,B=1
//
//  ModRM.rm = 100 -> use SIB byte
//
//   REX.R + ModRM.reg = 1000 ( r8) - dest reg
//   REX.X + SIB.index = 1010 (r10) - index reg
//   REX.B + SIB.base  = 1001 ( r9) - base reg

References

Footnotes

1) In cpp functions with the same name but different arguments can be defined. During compilation overload resolution takes place and picks the "right" function to call on each call site.