The “LD_PRELOAD trick” is a powerful technique to build tools and gadgets that dynamically replace symbols in an executable. This guide focuses on malloc and free, but this technique is of course applicable to any other symbols. Code snippets in this post will typically assume a Linux system, with a touch on interposition on macOs and Windows at the end.

What is interposition?

Let’s look at the shared libraries a random executable requires, say the ls command:

$ ldd /usr/bin/ls
        linux-vdso.so.1 (0x00007ffc4a9fe000)
        libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x0000790952e60000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000790952c00000)
        libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x0000790952b69000)
        /lib64/ld-linux-x86-64.so.2 (0x0000790952ec4000)

Each of those libraries contain functions and/or variables (together: symbols) that the program, ls, requires to properly execute. The shared objects will be loaded into the program’s address space at runtime. We can interpose the original shared objects, replacing any defined symbols by a different version. This interposition is done by providing another shared object, whose symbols will take priority over the ones defined in those initial libraries.

Example

Let’s walk through a first simple example replacing a custom function (not malloc). Here, we have three files, main.cpp with a main that just calls some f function, f.cpp which has the “normal” f implementation, and g.cpp which has the version we’d like to run instead.

// main.cpp
extern void f();
int main() {
    f();
}

// f.cpp
#include <iostream>
void f() {
    std::cout << "Hello, World!" << std::endl;
}

// g.cpp
#include <iostream>
void f() {
    std::cout << "Something is not quite right." << std::endl;
}

We compile the files like so:

$ clang++ -shared -fPIC -o libf.so f.cpp
$ clang++ -shared -fPIC -o libg.so g.cpp
$ clang++ -L. -Wl,-rpath=. -o main main.cpp -lf

When running the program normally, we get the intended “Hello, World!” message:

$ ./main
Hello, World!

We also see libf.so in the list of libraries required by the main executable:

$ ldd ./main
        linux-vdso.so.1 (0x00007ffebcac8000)
        libf.so => ./libf.so (0x00007f6f7b796000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f6f7b400000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6f7b69d000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f6f7b67d000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6f7b000000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f6f7b7a2000)

We can now interpose libf.so with libg.so, simply by setting the LD_PRELOAD environment variable. This will replace the f function that the program is meant to use by the one defined in libg.so!

$ LD_PRELOAD=./libg.so ./main
Something is not quite right.

The LD_PRELOAD environment variable is read by the dynamic loader, and the specified library or libraries are loaded alongside others, with their symbols taking precedence over any other definitions.

Interposing on malloc

While memory allocation functions (malloc family, free) have some quirks (explained below), they are basically just another symbol. They are defined in libc.so. Of course, the main problem is that we now have to re-implement malloc. If that was the goal, great! If not, more to come below.

extern "C" void* malloc(size_t size) {
    // Custom malloc implementation.
}

extern "C" void free(void* ptr) {
    // Custom free implementation.
}

// Of course, you might want to also provide calloc, realloc, aligned_alloc, ...

Wrapping malloc by retrieving the next definition

Sometimes, we only care about “observing” allocations. Instead of having to re-implement malloc, the existing symbol that was just replaced can be retrieved!

// noop.cpp
#include <dlfcn.h>

extern "C" void* malloc(size_t size) {
    static auto* next
        = reinterpret_cast<decltype(malloc)*>(dlsym(RTLD_NEXT, "malloc"));
    return next(size);
}

Compile this file with the dl library:

$ clang++ -shared -fPIC -o libnoop.so noop.cpp -ldl

What happens here: the first time the function is run (static), dlsym is called to retrieve the next malloc symbol (RTLD_NEXT). The function pointer is then stored in the variable next, and we can just call it!

Common issues and tips

It’s nice to have the ability to allocate memory within this wrapper. Memory allocation it almost always needed for any kind of standard output or output to a file. The most simple wrapper above will cause infinite recursion. This is solved with a single integer variable, incrementing it when entering a block that might allocate memory, and decrementing it when exiting that block.

#include <cstdint>

#include <dlfcn.h>

namespace {
thread_local std::int64_t busy = 0;
} // namespace

extern "C" void* malloc(size_t size) {
    static auto* next
        = reinterpret_cast<decltype(malloc)*>(dlsym(RTLD_NEXT, "malloc"));

    if (busy) {
        return next(size);
    }

    ++busy;

    // Do any kind of analysis, writing to file, ... here.

    --busy;

    return next(size);
}

Another common need is being able to have global variables or initialize some things before starting to record allocations for example. In C++, this can be solved easily with a static variable with a class constructor and/or destructor.

#include <atomic>
#include <cstdint>

#include <dlfcn.h>

namespace {
thread_local std::int64_t busy = 0;
std::atomic_bool initialized{false};
} // namespace

const struct Initialization {
    Initialization() {
        // Some initialization...
        initialized = true;
    }

    ~Initialization() {
        initialized = false;
        // Maybe some cleanup as needed...
    }
} _;

extern "C" void* malloc(size_t size) {
    // [...], see snippet above.

    if (!initialized || busy > 0) {
        return next(size);
    }

    // [...], see snippet above.
}

Interposing on macOs

Interposing is slightly different on macOs. In some ways, it’s easier! First, DYLD_INSERT_LIBRARIES is the equivalent environment variable to LD_PRELOAD. On the code side, the following is needed:

extern "C" void* interpose_malloc__(size_t size) {
    // [...], custom malloc wrapper/implementation.
    // Since the function name is different, you can in fact just call malloc
    // here! There is also no need to worry about re-entrancy.
}

__attribute__((used)) static struct {
    const void* replacment;
    const void* replacee;
} _interpose_malloc __attribute__((section("__DATA,__interpose")))
= {(const void*) (unsigned long) &interpose_malloc__,
   (const void*) (unsigned long) &malloc};

Putting it all together

Below is a simple example / template, only writing to standard output all malloc and free calls. It works under both Linux and macOs. I’ve added a helpful macro for Linux/macOs compatibility.

// logger.cpp
#include <atomic>
#include <cstdint>
#include <iostream>

#ifdef __APPLE__
#define DYLD_INTERPOSE(_replacment, _replacee)                                 \
    __attribute__((used)) static struct {                                      \
        const void* replacment;                                                \
        const void* replacee;                                                  \
    } _interpose_##_replacee __attribute__((section("__DATA,__interpose")))    \
    = {(const void*) (unsigned long) &_replacment,                             \
       (const void*) (unsigned long) &_replacee}
#define GET_NEXT_FUNCTION(NAME) ::NAME
#define INTERPOSE_FUNCTION_NAME(NAME) interpose_##NAME##__
#define INTERPOSE(NAME) DYLD_INTERPOSE(INTERPOSE_FUNCTION_NAME(NAME), NAME)
#else
#include <dlfcn.h>
#define GET_NEXT_FUNCTION(NAME)                                                \
    (reinterpret_cast<decltype(::NAME)*>(dlsym(RTLD_NEXT, #NAME)))
#define INTERPOSE_FUNCTION_NAME(NAME) NAME
#define INTERPOSE(NAME)
#endif

namespace {
std::atomic_bool initialized{false};
thread_local std::int64_t busy = 0;

const struct Initialization {
    Initialization() {
        // Some initialization...
        initialized = true;
    }

    ~Initialization() {
        initialized = false;
        // Maybe some cleanup as needed...
    }
} _;
} // namespace

extern "C" void* INTERPOSE_FUNCTION_NAME(malloc)(uint64_t size) {
    static auto* next = GET_NEXT_FUNCTION(malloc);
    void* result = next(size);

    if (!initialized || busy > 0) {
        return result;
    }

    ++busy;
    std::cout << "malloc(" << size << ") -> " << result << std::endl;
    --busy;

    return result;
}
INTERPOSE(malloc);

extern "C" void INTERPOSE_FUNCTION_NAME(free)(void* pointer) {
    static auto* next = GET_NEXT_FUNCTION(free);
    next(pointer);

    if (!initialized || busy > 0) {
        return;
    }

    ++busy;
    std::cout << "free(" << pointer << ")" << std::endl;
    --busy;
}
INTERPOSE(free);

Compile and use with these commands:

# On Linux.
$ clang++ -shared -fPIC -o liblogger.so logger.cpp -ldl
$ LD_PRELOAD=./liblogger.so ls

# On macOs.
$ clang++ -shared -fPIC -o liblogger.dylib logger.cpp -ldl
$ DYLD_INSERT_LIBRARIES=./liblogger.dylib ls

Interposing on Windows

Interposition as described here is overly complicated to impossible on Windows. Switch to a Unix system.