3

I'm reading the Driving Compilers series of articles by Fabien Sanglard. In part 3 about The Compiler there is an example for hooking malloc function. First, an incorrect solution that falls in an infinite recursion is shown:

void* malloc(size_t sz) {
  void *(*libc_malloc)(size_t) = dlsym(RTLD_NEXT, "malloc");
  printf("malloced %zu bytes\n", sz);
  return libc_malloc(sz);
}

The reason for the recursion is that dlsym function internally calls malloc.

After that, a supposedly fixed solution is provided:

#include <stdio.h>
#include <dlfcn.h>

static void* (*real_malloc)(size_t) = nullptr;

void *malloc(size_t size) {
    if(!real_malloc)  {
      real_malloc = dlsym(RTLD_NEXT, "malloc");
    }

    printf("malloc(%d) = ", size);
    return real_malloc(size);
}

Except that the solution is not really fixed because it suffers from the same problem. If I rename the function from malloc to my_malloc for example then it will work, but it is no longer a hook because the other software uses malloc but not my_malloc. Is there any solution to the problem?

bobeff
  • 3,543
  • 3
  • 34
  • 62
  • 1
    Maybe add which libc implementation you are using (e.g. glibc, musl, etc.). I have some doubts that they will behave the same. – user17732522 May 17 '23 at 21:21
  • 1
    *"dlsym function internally calls malloc"* Are you sure about that? glibc 2.17 on CentOS7 doesn't appear to based on valgrind. – dbush May 17 '23 at 21:23
  • @dbush I'm not sure but this is mentioned in the article. It is possible the infinite recursion to be not the case because putting `printf` at the beginning of the hook prints nothing before the segfault. – bobeff May 17 '23 at 21:31
  • @user17732522 I don't know how to check which is the *libc* implementation on my system but I'm on the latest Ubuntu which according to a web search seems to use *glibc*. – bobeff May 17 '23 at 21:35
  • A little more investigation shows that what really calls malloc and an infinite recursion is the `printf` function but not the `dlsym`. This reduces the question to how to do the printing the right way. – bobeff May 17 '23 at 21:41
  • 1
    that should be `printf("malloc(%zu) = \n", size);` – yano May 17 '23 at 21:41
  • 1
    you could try [`puts`](https://man7.org/linux/man-pages/man3/puts.3.html), don't know if that internally calls `malloc`, I'd guess not. Might have to get creative with it if you also want to see the size .. print to a local char array perhaps, then `puts` that, or maybe `putchar` one byte of the size at a time ? – yano May 17 '23 at 21:46
  • 1
    @yano Not really - `printf()` is likely to call `malloc()` itself internally and cause infinite recursion. `printf()` and all related functions can't safely be used. – Andrew Henle May 17 '23 at 21:46
  • 1
    @yano Yes. `puts()` and, unless you have something platform-specific saying a function is reentrant, the entire list of functions in the C standard, per C11 7.1.4p4: "The functions in the standard library are not guaranteed to be reentrant ..." – Andrew Henle May 17 '23 at 22:16
  • @AndrewHenle `puts` being a "related function"? Or just the print-to-string functions? – yano May 17 '23 at 22:16
  • 1
    @AndrewHenle gotcha ,, you're too fast, screwed up the chronology ;) thanks for the reply – yano May 18 '23 at 06:03

2 Answers2

2

First, per 7.1.4 Use of library functions, paragraph 4 of the (draft) C11 standard:

The functions in the standard library are not guaranteed to be reentrant and may modify objects with static or thread storage duration.

So you can't safely use any function from the C standard and be safe. You need to rely on platform-specific solutions.

First, you can find what functions your platform allows to be called from within signal handlers - these functions pretty much have to be reentrant.

For POSIX-based systems, which I'm assuming you're using because you use the POSIX function dlsym(), you can start with 2.4 Signal Concepts which has an extensive list.

Note that write() is on the list of async-signal-safe functions, but printf() is not.

So your code

static void* (*real_malloc)(size_t) = nullptr;

void *malloc(size_t size) {
    if(!real_malloc)  {
      real_malloc = dlsym(RTLD_NEXT, "malloc");
    }

    printf("malloc(%d) = ", size);
    return real_malloc(size);
}

can be replaced with

static void* (*real_malloc)(size_t) = nullptr;

void *malloc(size_t size) {
    if(!real_malloc)  {
      static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

      pthread_mutex_lock( &mutex );
      if(!real_malloc)  {
        real_malloc = dlsym(RTLD_NEXT, "malloc");
      }
      pthread_mutex_unlock( &mutex );
    }

    write( STDOUT_FILENO, "malloc()", strlen( "malloc()" );
    return real_malloc(size);
}

Note that I omitted the address - that would need to be converted to a string. It's not hard to do, and examples on how to do that can easily be found, such as in the answers to How can I convert an int to a string in C?. Note that you can't use any answers there that use a standard library function that isn't async-signal-safe.

And if you're running on Solaris, s[n]printf() actually is async-signal-safe there.

I also added some protection for multithreaded use - there's a race condition in obtaining the pointer value to the actual malloc() that should be protected against, if only because if the pointer value gets corrupted you will likely never be able to reproduce whatever error(s) that causes.

EDIT

Per the comment from @ChrisDodd, fixed to address concerns about safety of dlsym():

static void* (*real_malloc)(size_t) = NULL;

__attribute__((constructor))
static void initValues(void) {
    real_malloc = dlsym(RTLD_NEXT, "malloc");
}

void *malloc(size_t size) {
    write( STDOUT_FILENO, "malloc()", strlen( "malloc()" );
    return real_malloc(size);
}

Note that the code inside the malloc() replacement is now much simpler - there's no race condition possible.

Andrew Henle
  • 32,625
  • 3
  • 24
  • 56
  • 1
    per the POSIX spec, dlsym is *not* async safe, so using it could be a problem. In practice it is only likely to call malloc if called with a handle to library opened by dlopen (ie, not when called with the special handles RTLD_DEFAULT or RTLD_NEXT, as they only look at already fully loaded and resolve objects) – Chris Dodd May 17 '23 at 22:38
  • @ChrisDodd Good point. Although I'm using async-signal-safe here as a proxy for both being reentrant and "doesn't use `malloc()` *et al*" (using `malloc()` or any related function would make a function async-signal-**un**safe). `dlsym()` could indeed be dangerous - and it won't be reentrant because it pretty much has to use internal locks to make changes to the process's address space, in addition to its potential use of `malloc()`. As you said, though, just looking up a symbol should be safe. – Andrew Henle May 17 '23 at 22:46
  • @ChrisDodd I updated the code to address potential `dlsym()` issues. – Andrew Henle May 17 '23 at 22:55
  • Were signal handlers mentioned in the Original Post? – Lorinczy Zsigmond May 18 '23 at 15:47
1

A little more investigation shows that what really calls malloc and leads to an infinite recursion is the printf function call, but not the dlsym as written in the article. This reduces the question to how to do the printing in such a way that this not happens. I came up with the following solution:

#include <stdio.h>
#include <dlfcn.h>

static void* (*real_malloc)(size_t) = NULL;

void* malloc(size_t size) {
  if(!real_malloc)  {
    real_malloc = dlsym(RTLD_NEXT, "malloc");
  }

  static char isPrintF = 0;
  if (isPrintF) {
    return real_malloc(size);
  }

  char* p = real_malloc(size);
  isPrintF = 1;
  printf("malloc(%zu) = %p\n", size, p);
  isPrintF = 0;
  return p;
}
bobeff
  • 3,543
  • 3
  • 34
  • 62
  • 1
    Note though, that using a static flag like you do with `isPrintF` isn't multithread-safe. – Andrew Henle May 17 '23 at 22:09
  • snprintf + write maybe? – Erki Aring May 18 '23 at 07:51
  • @ErkiAring `snprintf()` isn't guaranteed to be work - it's likely to use `malloc()` or similar functions internally. The only platform I know where `s[n]printf()` is safe to use here is Solaris. Linux is definitely not safe as glibc's `s[n]printf()` definitely does use `malloc()` internally. – Andrew Henle May 18 '23 at 10:49