Dynamic linking on Linux (x86_64)

As I was interested in how the bits behind dynamic linking work, this article is about exploring this topic. However, since dynamic linking strongly depends on the OS, the architecture and the binary format, I only focus on one combination here. Spending most of my time with Linux on x86 or ARM I chose the following for this article:

Introduction to dynamic linking

Dynamic linking is used in the case we have non-statically linked applications. This means an application uses code which is not included in the application itself, but in a shared library. The shared libraries in turn can be used by multiple applications. The applications contain relocation entries which need to be resolved during runtime, because shared libraries are compiled as position independant code (PIC) so that they can be loaded at any any address in the applications virtual address space. This process of resolving the relocation entries at runtime is what I am referring as dynamic linking in this article.

The following figure shows a simple example, where we have an application foo using a function bar from the shared library libbar.so. The boxes show the virtual memory mapping for foo over time where time increases to the right.

         foo                                   foo
    +-----------+                         +-----------+
    |           |                         |           |
    +-----------+                         +-----------+
    | .text.foo |                         | .text.foo |
    |           |                         |           |
    | ...       | trigger resolve reloc   | ...       |
pc->| call bar  | X----+                  | call bar  |--+
    | ...       |      |                  | ...       |  |
    +-----------+      |                  +-----------+  |
    |           |      |                  |           |  |
    |           |      |                  |           |  |
    +-----------+      |                  +-----------+  |
    | .text.bar |      |                  | .text.bar |  |
    | ...       |      |                  | ...       |  |
    | bar:      |      +---->[ld.so]----> | bar:      |<-+pc
    | ...       |                         | ...       |
    +-----------+                         +-----------+
    |           |                         |           |
    +-----------+                         +-----------+

Conceptual overview && important parts of "the" ELF

In the following I assume a basic understanding of the ELF binary format.

Before jumping into the details of dynamic linking it is important to get an conceptual overview, as well as to understand which sections of the ELF file actually matter.

On x86 calling a function in a shared library works via one indirect jump. When the application wants to call a function in a shared library it jumps to a well know location contained in the code of the application, called a trampoline. From there the application then jumps to a function pointer stored in a global table (GOT = global offset table). The application contains one trampoline per function used from a shared library.

When the application jumps to a trampoline for the first time the trampoline will dispatch to the dynamic linker with the request to resolve the symbol. Once the dynamic linker found the address of the symbol it patches the function pointer in the GOT so that consecutive calls directly dispatch to the library function.

    foo:                              GOT
      ...                        +------------+
+---- call bar_trampoline     +- | 0xcafeface | [0] resolve (dynamic linker)
|     call bar_trampoline     |  +------------+
|     ...                     |  | 0xcafeface | [1] resolve (dynamic linker)
|                             |  +------------+
+-> bar_trampoline:           |
      jump GOT[0] <-----------+
      jump GOT[1]

Once this is done, further calls to this symbol will be directly forwarded to the correct address from the corresponding trampoline.

    foo:                              GOT
      ...                        +------------+
      call bar_trampoline     +- | 0x01234567 | [0] bar (libbar.so)
+---- call bar_trampoline     |  +------------+
|     ....                    |  | 0xcafeface | [1] resolve (dynamic linker)
|                             |  +------------+
+-> bar_trampoline:           |
      jump GOT[0] <-----------+
      jump GOT[1]

With that in mind we can take a look and check which sections of the ELF file are important for the dynamic linking process.

This section contains all the trampolines for the external functions used by the ELF file

This section contains the global offset table GOT for this ELF files trampolines.

This section holds the relocation entries, which are used by the dynamic linker to find which symbol needs to be resolved and which location in the GOT to be patched. (Whether it is rel or rela depends on the DT_PLTREL entry in the .dynamic section.

The bits behind dynamic linking

Now that we have the basic concept and know which sections of the ELF file matter we can take a look at an actual example. For the analysis I am going to use the following C program and build it explicitly as non position independant executable (PIE).

Using -no-pie has no functional impact, it is only used to get absolute virtual addresses in the ELF file, which makes the analysis easier to follow.

// main.c
#include <stdio.h>
int main(int argc, const char* argv[]) {
    printf("%s argc=%d\n", argv[0], argc);
    return 0;
> gcc -o main main.c -no-pie

We use radare2 to open the compiled file and print the disassembly of the .got.plt and .plt sections.

> r2 -A ./main
[0x00401050]> pd5 @ section..got.plt
            ;-- section..got.plt:
            ;-- _GLOBAL_OFFSET_TABLE_:
       [0]  0x00404000      .qword 0x0000000000403e10 ; section..dynamic ; sym..dynamic
       [1]  0x00404008      .qword 0x0000000000000000
       [2]  0x00404010      .qword 0x0000000000000000
            ;-- reloc.puts:
       [3]  0x00404018      .qword 0x0000000000401036
            ;-- reloc.printf:
       [4]  0x00404020      .qword 0x0000000000401046

[0x00401050]> pd9 @ section..plt
            ;-- section..plt:
       ┌┌─> 0x00401020      ff35e22f0000   push qword [0x00404008]
       ╎╎   0x00401026      ff25e42f0000   jmp qword [0x00404010]
       ╎╎   0x0040102c      0f1f4000       nop dword [rax]
     int sym.imp.puts (const char *s);
       ╎╎   0x00401030      ff25e22f0000   jmp qword [reloc.puts]   ; 0x00404018
       ╎╎   0x00401036      6800000000     push 0
       └──< 0x0040103b      e9e0ffffff     jmp sym..plt
     int sym.imp.printf (const char *format);
0x00401040      ff25da2f0000   jmp qword [reloc.printf] ; 0x00404020
0x00401046      6801000000     push 1
        └─< 0x0040104b      e9d0ffffff     jmp sym..plt

Taking a quick look at the .got.plt section we see the global offset table GOT. The entries GOT[0..2] have special meanings, GOT[0] holds the address of the .dynamic section for this ELF file, GOT[1..2] will be filled by the dynamic linker at program startup. Entries GOT[3] and GOT[4] contain the function pointers for puts and printf accordingly.

In the .plt section we can find three trampolines

  1. 0x00401020 dispatch to runtime linker (special role)
  2. 0x00401030 puts
  3. 0x00401040 printf

Looking at the puts trampoline we can see that the first instruction jumps to a location stored at 0x00404018 (reloc.puts) which is the GOT[3]. In the beginning this entry contains the address of the push 0 instruction coming right after the jmp. This push instruction sets up some meta data for the dynamic linker. The next instruction then jumps into the first trampoline, which pushes more meta data (GOT[1]) onto the stack and then jumps to the address stored in GOT[2].

GOT[1] & GOT[2] are zero here because they get filled by the dynamic linker at program startup.

To understand the push 0 instruction in the puts trampoline we have to take a look at the third section of interest in the ELF file, the .rela.plt section.

# -r    print relocations
# -D    use .dynamic info when displaying info
> readelf -W -r ./main
Relocation section '.rela.plt' at offset 0x4004d8 contains 2 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000404018  0000000200000007 R_X86_64_JUMP_SLOT     0000000000000000 puts@GLIBC_2.2.5 + 0
0000000000404020  0000000300000007 R_X86_64_JUMP_SLOT     0000000000000000 printf@GLIBC_2.2.5 + 0

The 0 passed as meta data to the dynamic linker means to use the relocation at index [0] in the .rela.plt section. From the ELF specification we can find how a relocation of type rela is defined:

// man 5 elf
typedef struct {
    Elf64_Addr r_offset;
    uint64_t   r_info;
    int64_t    r_addend;
} Elf64_Rela;

#define ELF64_R_SYM(i)   ((i) >> 32)
#define ELF64_R_TYPE(i)  ((i) & 0xffffffff)

r_offset holds the address to the GOT entry which the dynamic linker should patch once it found the address of the requested symbol. The offset here is 0x00404018 which is exactly the address of GOT[3], the function pointer used in the puts trampoline. From r_info the dynamic linker can find out which symbol it should look for.

ELF64_R_SYM(0x0000000200000007) -> 0x2

The resulting index [2] is the offset into the dynamic symbol table (.dynsym). Dumping the dynamic symbol table with readelf we can see that the symbol at index [2] is puts.

# -s    print symbols
> readelf -W -s ./main
Symbol table '.dynsym' contains 7 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterTMCloneTable
     2: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND puts@GLIBC_2.2.5 (2)
     3: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND printf@GLIBC_2.2.5 (2)

Appendix: .dynamic section

The .dynamic section of an ELF file contains important information for the dynamic linking process and is created when linking the ELF file.

The information can be accessed at runtime using following symbol

extern Elf64_Dyn _DYNAMIC[];

which is an array of Elf64_Dyn entries

typedef struct {
    Elf64_Sxword    d_tag;
    union {
        Elf64_Xword d_val;
        Elf64_Addr  d_ptr;
    } d_un;
} Elf64_Dyn;

Since this meta-information is specific to an ELF file, every ELF file has its own .dynamic section and _DYNAMIC symbol.

Following entries are most interesting for dynamic linking:

DT_PLTGOTd_ptraddress of .got.plt
DT_JMPRELd_ptraddress of .rela.plt
DT_PLTRELSZd_valsize of .rela.plt table
DT_RELENTd_valsize of a single REL entry (PLTREL == DT_REL)
DT_RELAENTd_valsize of a single RELA entry (PLTREL == DT_RELA)

We can use readelf to dump the .dynamic section. In the following snippet I only kept the relevant entries:

# -d dump .dynamic section
> readelf -d ./main

Dynamic section at offset 0x2e10 contains 24 entries:
  Tag        Type                         Name/Value
 0x0000000000000003 (PLTGOT)             0x404000
 0x0000000000000002 (PLTRELSZ)           48 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x4004d8
 0x0000000000000009 (RELAENT)            24 (bytes)

We can see that PLTGOT points to address 0x404000 which is the address of the GOT as we saw in the radare2 dump. Also we can see that JMPREL points to the relocation table. PLTRELSZ / RELAENT tells us that we have 2 relocation entries which are exactly the ones for puts and printf.