0

Generic theory question.

I'm trying to build a library from source (FFTW if anyone cares, but it really doesn't matter), and I noticed there is an option to disable the use of alloca.

I'm aware of the dangers of using alloca, but I'm assessing the performance of FFTW with and without alloca.

Does alloca have known issues with thread safety? I'm seeing an extreme performance hit when I use a certain number of threads with FFTW (which is obviously calling alloca in the background). I'm sticking to using a number of threads equal to powers of 2, if that matters.

Is it possible that FFTW is sharing objects on a thread-local stack via alloca? I'm just trying to figure out why I see such extreme performance hits with certain numbers of threads. However, I don't fully understand the theory behind what alloca is really doing w/ threads.

user42390
  • 491
  • 1
  • 5
  • 9
  • 2
    `char *s = alloca(10);` is as safe as writing `char s[10];`. Indeed, alloca is (almost?) obsolete with the invention of VLA. – Ctx Jan 30 '19 at 13:38
  • When you say that you "see such extreme performance hits with certain numbers of threads", how many threads are you talking about here? How many cores are there in your machine? – Some programmer dude Jan 30 '19 at 13:38
  • 1
    @Someprogrammerdude: I have 112 cores, 56 of which are hyperthreads. If I use 1, 2, 16, or 32 threads, my performance is fine. (More threads used, better performance.) However, if I use 4 or 8 threads, the performance is 38x *slower* than with 32 threads. – user42390 Jan 30 '19 at 13:47
  • As for `alloca`, it's basically just adjusts the stack pointer to fit the memory you want to allocate on the stack. – Some programmer dude Jan 30 '19 at 13:51
  • One possible thing that can happen: Thread stacks are strongly aligned so that data on the stack is good for use with aligned instructions like SSE or AVX. This strong alignment can cause cache false sharing https://en.wikipedia.org/wiki/False_sharing which can destroy thread performance by convincing the cache that different memory locations should be on the same cache line, causing unnecessary cache evictions. Try making each of your threads alloca(sizeof your_data * thread_number) before the real data or allocate more than you need and use a start offset. – Zan Lynx Jan 30 '19 at 16:46
  • False sharing may be the wrong term there. I mean the thing where the cache is confused because multiple memory locations map to the same cache line. Aliasing? It's a form of false sharing, I know. But I can't find the term at the moment. – Zan Lynx Jan 30 '19 at 16:55
  • Like this: https://stackoverflow.com/questions/15016359/cache-coloring-on-slab-memory-management-in-linux-kernel – Zan Lynx Jan 30 '19 at 16:56

1 Answers1

0

Short answer: It doesn't. alloca() is guaranteed to be MT-safe.


Longer answer: alloca() isn't complicated function. By specification, it returns pointer to location that can be automatically freed. Please note that it's not a good practice anymore:

The alloca() function returns a pointer to the beginning of the allocated space. If the allocation causes stack overflow, program behaviour is undefined.

As you can see, allocation can cause stackoverflow, so the space is allocated on the stack by bumping SP. Threads share heap, but not stack, so there is no way you will run into trouble with multi-threaded use of alloca().

Safer way would be using VLA rather than alloca(), because both do the same and (as I suspect), VLA is faster and lighter.

Kamila Szewczyk
  • 1,874
  • 1
  • 16
  • 33