1

I read online that most moden UNIX systems come with a thread-safe malloc() by default. I know this simply means that a thread can call malloc() safely while another thread is already in the middle of a malloc() call itself.

I am using pthreads for my multithreading. I have a 12-core CPU with 2 threads per core. So 24 threads in total. Also, I'm using the GNU C library implementation of malloc.

My question is about doing them at the same time without locking/waiting/blocking. I read in an answer that malloc() "uses internal locking mechanisms" when multiple threads are calling it at the same time.

So here's my question exactly:

If 8 threads happen to call malloc() at the exact same time, would there be 8 malloc calls happening in parallel, and they won't interfere with each other whatsoever?

Or is it the case that when one thread calls malloc(), the other threads MUST WAIT for this thread's malloc call to finish BEFORE they can proceed with their own malloc calls?

(I'm asking this because I just multithreadified a C program of mine that does make extensive use of malloc() and free(), and the speedup was not linear with threads used, even though logically it should have been, because none of the threads rely on anything global so no contention should be happening (in software anyway). My scenario is simple: Each thread calls a function that takes roughly 315 seconds to complete on 1 thread (no multithreading), which makes millions of other calls to functions I've defined. Since function code is read-only, there should be no problem in speedup of X threads running this top-level function in parallel, given that each thread called it with its own arguments and no threads are relying on anything global or shared. When I used 4 threads, the time for some reason went up from 315 sec to 710 seconds, and when I used 8 threads, the time went up to 1400 seconds, even though each thread is doing exactly the same work that the one thread without multithreading was doing, and was taking 315 seconds to complete. So, what the hell??)

  • 2
    I humbly think you should rather ask a question about your _actual_ problem (so, why you do not beneficiate from a speed-up even if you are splitting works accross threads), and show the code you wrote. Additional note: adding more threads doesn't mean it will become necessarly faster (this is known as [Amdahl's law](https://en.wikipedia.org/wiki/Amdahl%27s_law)). – Erel Aug 31 '23 at 19:54
  • No, I simply want to find out whether malloc() blocks other threads from calling it if one thread is already during a call to it. – Kevin Stefanov Aug 31 '23 at 19:59
  • 1
    `malloc` will do an _internal_ version of `pthread_mutex_lock/pthread_mutex_unlock`, so no worries ... FYI, `pthread_*` will call these internal/low level lock functions under the hood. As to your speed issues, you should either have the main/master thread do all allocations before the first `pthread_create` and put them in a per-thread struct (the last arg to `pthread_create` points to this). We really need to see your code. Please edit your question and post your code. It's easy to do a problem breakdown that is worse with threads. – Craig Estey Aug 31 '23 at 20:08
  • 2
    Doing massive `malloc/free` inside the thread _is_ going to be a bottleneck. There are allocation strategies to mitigate this (e.g. slab allocation). For an example, see my answer: [cs50 pset5 Speller optimisation](https://stackoverflow.com/a/71316468/5382650) – Craig Estey Aug 31 '23 at 20:13
  • 3
    The glibc version of `malloc` will use separate arenas for the individual threads, if thread contention gets too high. That way, the allocators of the individual threads will generally not hinder each other. See [this page](https://sourceware.org/glibc/wiki/MallocInternals#Arenas_and_Heaps) for further information. – Andreas Wenzel Aug 31 '23 at 20:23
  • 1
    Unless you are using [NUMA](https://en.wikipedia.org/wiki/Non-uniform_memory_access), all of your CPU cores will be using the same memory controller. So if that is the bottleneck, then adding additional CPU cores will not give you a significant speed increase. – Andreas Wenzel Aug 31 '23 at 21:08

2 Answers2

4

If 8 threads happen to call malloc() at the exact same time, would there be 8 malloc calls happening in parallel, and they won't interfere with each other whatsoever?

It depends on the malloc() implementation, among other things. Modern C standard libraries for general-purpose operating systems generally cater to simultaneous multiprocessing.

Glibc's malloc, for example, maintains multiple memory arenas from which to allocate, so as to avoid a single malloc() call forcing all others to block until it completes. It manages these adaptively, but by default allows up to eight times as many arenas as there are CPUs in the system. This is per process, of course. If you are running on a Glibc-based system, then, it might indeed be that your 8 malloc calls all proceed simultaneously. No interference whatsoever is a very high bar, but I think it would be safe to say that there would usually be minimal interference.

The answer might be different on other systems. In particular, Windows' allocator has a poor performance reputation in general, though I don't know specifically about how well it handles in multithreaded apps.


Nevertheless, if your threads are doing so much dynamic memory management that you supposed that was a likely source of your performance issue, then that's probably too much. Even if it's not specifically an issue for scaling up the number of threads, malloc and free are comparatively slow, so you should try to minimize their use where performance is important.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
2

Yes, malloc() will block any other threads calling malloc() at the same time, until that one is done. (Though see comment below on malloc arenas, which can greatly reduce the contention.) You can try tcmalloc for efficient allocations from many threads. Alternatively, you can often find a way to allocate the memory that will be needed for each thread ahead of time, and then free it all after they return.

This may or may not (I'm guessing not) be the source of your lack of scaling with processors. You need to ask a question with your actual problem — not your guess about what's causing the problem.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • 3
    That depends on the malloc implementation. For example, Glibc's malloc achieves thread safety in part by setting up different "arenas" from which memory can be allocated, so as to improve efficiency when multiple threads attempt to allocate memory at the same time. One thread running in this `malloc` generally *will not* block concurrent `malloc` execution by other threads. – John Bollinger Aug 31 '23 at 20:59
  • I do use glibc's malloc implementation. That means I have to look elsewhere to figure out why I'm not achieving linear speedup from threads? – Kevin Stefanov Aug 31 '23 at 21:14
  • I'd say so. glibc does limit the number of arenas, so it might be possible to still have high contention if you have many more threads than cores. – Mark Adler Aug 31 '23 at 23:42