Guide: Using dynamic linking/loading to interpose on malloc
The “LD_PRELOAD trick” is a powerful technique to build tools and gadgets that dynamically replace symbols in an
executable. This guide focuses on malloc
and free
, but this technique is of course applicable to any other symbols.
Code snippets in this post will typically assume a Linux system, with a touch on interposition on macOs and Windows at
the end.
What is interposition?
Let’s look at the shared libraries a random executable requires, say the ls
command:
$ ldd /usr/bin/ls
linux-vdso.so.1 (0x00007ffc4a9fe000)
libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x0000790952e60000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000790952c00000)
libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x0000790952b69000)
/lib64/ld-linux-x86-64.so.2 (0x0000790952ec4000)
Each of those libraries contain functions and/or variables (together: symbols) that the program, ls
, requires to
properly execute. The shared objects will be loaded into the program’s address space at runtime. We can interpose
the original shared objects, replacing any defined symbols by a different version. This interposition is done by
providing another shared object, whose symbols will take priority over the ones defined in those initial libraries.
Example
Let’s walk through a first simple example replacing a custom function (not malloc
). Here, we have three files,
main.cpp
with a main
that just calls some f
function, f.cpp
which has the “normal” f
implementation, and
g.cpp
which has the version we’d like to run instead.
// main.cpp
extern void f();
int main() {
f();
}
// f.cpp
#include <iostream>
void f() {
std::cout << "Hello, World!" << std::endl;
}
// g.cpp
#include <iostream>
void f() {
std::cout << "Something is not quite right." << std::endl;
}
We compile the files like so:
$ clang++ -shared -fPIC -o libf.so f.cpp
$ clang++ -shared -fPIC -o libg.so g.cpp
$ clang++ -L. -Wl,-rpath=. -o main main.cpp -lf
When running the program normally, we get the intended “Hello, World!” message:
$ ./main
Hello, World!
We also see libf.so
in the list of libraries required by the main
executable:
$ ldd ./main
linux-vdso.so.1 (0x00007ffebcac8000)
libf.so => ./libf.so (0x00007f6f7b796000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f6f7b400000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6f7b69d000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f6f7b67d000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6f7b000000)
/lib64/ld-linux-x86-64.so.2 (0x00007f6f7b7a2000)
We can now interpose libf.so
with libg.so
, simply by setting the LD_PRELOAD
environment variable. This will
replace the f
function that the program is meant to use by the one defined in libg.so
!
$ LD_PRELOAD=./libg.so ./main
Something is not quite right.
The LD_PRELOAD
environment variable is read by the dynamic loader, and the specified library or libraries are loaded alongside others, with their symbols
taking precedence over any other definitions.
Interposing on malloc
While memory allocation functions (malloc
family, free
) have some quirks (explained below), they are basically just
another symbol. They are defined in libc.so
. Of course, the main problem is that we now have to re-implement malloc
.
If that was the goal, great! If not, more to come below.
extern "C" void* malloc(size_t size) {
// Custom malloc implementation.
}
extern "C" void free(void* ptr) {
// Custom free implementation.
}
// Of course, you might want to also provide calloc, realloc, aligned_alloc, ...
Wrapping malloc
by retrieving the next definition
Sometimes, we only care about “observing” allocations. Instead of having to re-implement malloc
, the existing symbol
that was just replaced can be retrieved!
// noop.cpp
#include <dlfcn.h>
extern "C" void* malloc(size_t size) {
static auto* next
= reinterpret_cast<decltype(malloc)*>(dlsym(RTLD_NEXT, "malloc"));
return next(size);
}
Compile this file with the dl
library:
$ clang++ -shared -fPIC -o libnoop.so noop.cpp -ldl
What happens here: the first time the function is run (static
), dlsym
is called to retrieve the next malloc
symbol
(RTLD_NEXT
). The function pointer is then stored in the variable next
, and we can just call it!
Common issues and tips
It’s nice to have the ability to allocate memory within this wrapper. Memory allocation it almost always needed for any kind of standard output or output to a file. The most simple wrapper above will cause infinite recursion. This is solved with a single integer variable, incrementing it when entering a block that might allocate memory, and decrementing it when exiting that block.
#include <cstdint>
#include <dlfcn.h>
namespace {
thread_local std::int64_t busy = 0;
} // namespace
extern "C" void* malloc(size_t size) {
static auto* next
= reinterpret_cast<decltype(malloc)*>(dlsym(RTLD_NEXT, "malloc"));
if (busy) {
return next(size);
}
++busy;
// Do any kind of analysis, writing to file, ... here.
--busy;
return next(size);
}
Another common need is being able to have global variables or initialize some things before starting to record allocations for example. In C++, this can be solved easily with a static variable with a class constructor and/or destructor.
#include <atomic>
#include <cstdint>
#include <dlfcn.h>
namespace {
thread_local std::int64_t busy = 0;
std::atomic_bool initialized{false};
} // namespace
const struct Initialization {
Initialization() {
// Some initialization...
initialized = true;
}
~Initialization() {
initialized = false;
// Maybe some cleanup as needed...
}
} _;
extern "C" void* malloc(size_t size) {
// [...], see snippet above.
if (!initialized || busy > 0) {
return next(size);
}
// [...], see snippet above.
}
Interposing on macOs
Interposing is slightly different on macOs. In some ways, it’s easier! First, DYLD_INSERT_LIBRARIES
is the equivalent
environment variable to LD_PRELOAD
. On the code side, the following is needed:
extern "C" void* interpose_malloc__(size_t size) {
// [...], custom malloc wrapper/implementation.
// Since the function name is different, you can in fact just call malloc
// here! There is also no need to worry about re-entrancy.
}
__attribute__((used)) static struct {
const void* replacment;
const void* replacee;
} _interpose_malloc __attribute__((section("__DATA,__interpose")))
= {(const void*) (unsigned long) &interpose_malloc__,
(const void*) (unsigned long) &malloc};
Putting it all together
Below is a simple example / template, only writing to standard output all malloc
and free
calls. It works under both
Linux and macOs. I’ve added a helpful macro for Linux/macOs compatibility.
// logger.cpp
#include <atomic>
#include <cstdint>
#include <iostream>
#ifdef __APPLE__
#define DYLD_INTERPOSE(_replacment, _replacee) \
__attribute__((used)) static struct { \
const void* replacment; \
const void* replacee; \
} _interpose_##_replacee __attribute__((section("__DATA,__interpose"))) \
= {(const void*) (unsigned long) &_replacment, \
(const void*) (unsigned long) &_replacee}
#define GET_NEXT_FUNCTION(NAME) ::NAME
#define INTERPOSE_FUNCTION_NAME(NAME) interpose_##NAME##__
#define INTERPOSE(NAME) DYLD_INTERPOSE(INTERPOSE_FUNCTION_NAME(NAME), NAME)
#else
#include <dlfcn.h>
#define GET_NEXT_FUNCTION(NAME) \
(reinterpret_cast<decltype(::NAME)*>(dlsym(RTLD_NEXT, #NAME)))
#define INTERPOSE_FUNCTION_NAME(NAME) NAME
#define INTERPOSE(NAME)
#endif
namespace {
std::atomic_bool initialized{false};
thread_local std::int64_t busy = 0;
const struct Initialization {
Initialization() {
// Some initialization...
initialized = true;
}
~Initialization() {
initialized = false;
// Maybe some cleanup as needed...
}
} _;
} // namespace
extern "C" void* INTERPOSE_FUNCTION_NAME(malloc)(uint64_t size) {
static auto* next = GET_NEXT_FUNCTION(malloc);
void* result = next(size);
if (!initialized || busy > 0) {
return result;
}
++busy;
std::cout << "malloc(" << size << ") -> " << result << std::endl;
--busy;
return result;
}
INTERPOSE(malloc);
extern "C" void INTERPOSE_FUNCTION_NAME(free)(void* pointer) {
static auto* next = GET_NEXT_FUNCTION(free);
next(pointer);
if (!initialized || busy > 0) {
return;
}
++busy;
std::cout << "free(" << pointer << ")" << std::endl;
--busy;
}
INTERPOSE(free);
Compile and use with these commands:
# On Linux.
$ clang++ -shared -fPIC -o liblogger.so logger.cpp -ldl
$ LD_PRELOAD=./liblogger.so ls
# On macOs.
$ clang++ -shared -fPIC -o liblogger.dylib logger.cpp -ldl
$ DYLD_INSERT_LIBRARIES=./liblogger.dylib ls
Interposing on Windows
Interposition as described here is overly complicated to impossible on Windows. Switch to a Unix system.