5

I am tuning a high performance multi-threaded application and I suspect that false sharing might be a cause for why performance is bad. How can I verify that this is the case?

I am running C++ on Ubuntu 12.04 using gcc 4.82.

Nathan Doromal
  • 3,437
  • 2
  • 24
  • 25
  • 3
    That is nowhere near enough information to give you anything useful. – sjdowling Oct 16 '14 at 12:31
  • Remove the false sharing and see if performance improves. – molbdnilo Oct 16 '14 at 12:42
  • @molbdnilo I suspect it is happening but I am not sure if it is. I am looking for a definitive way to say "yes false sharing is occurring" at which point then I can remove it and see if performance improves. – Nathan Doromal Oct 16 '14 at 12:46
  • http://stackoverflow.com/questions/7079950/tools-to-detect-false-sharing-in-a-c-c-application – marcinj Oct 16 '14 at 12:52
  • 4
    False sharing is just a specific way in which cache performance can be bad. So I'd suggest [this question](http://stackoverflow.com/questions/10082517/simplest-tool-to-measure-c-program-cache-hit-miss-and-cpu-time-in-linux), and in particular [perf](https://perf.wiki.kernel.org/index.php/Tutorial#Counting_with_perf_stat) to examine cache misses. Running the wikipedia false sharing example with `perf stat -e cache-misses`: with OpenMP, 129515 cache-misses. Without OpenMP (serial): 1551 cache-misses. Clearly some thread-related cache problems there. – Jonathan Dursi Oct 16 '14 at 13:40

2 Answers2

5

I'll post something I just came across from C++ Concurrency in Action.

One way to test for false sharing is to add huge blocks of padding between data elements that can be concurrently accessed by different threads.

struct protected_data
{
   std::mutex m;
   char padding[65536];
   my_data data_to_protect; 
}; 

If this improves performance then you know that false sharing was a problem.

Nathan Doromal
  • 3,437
  • 2
  • 24
  • 25
  • 1
    Padding can indeed help a lot, although 64k is probably excessive. See http://stackoverflow.com/questions/8620303/how-many-bytes-does-a-xeon-bring-into-the-cache-per-memory-access and do visit the article linked in the accepted answer. Also, instead of manually adding paddings you could use the `aligned` type attribute [available on gcc](https://gcc.gnu.org/onlinedocs/gcc-3.1/gcc/Type-Attributes.html) – fvu Oct 17 '14 at 15:40
3

The answer to this is pretty much the same as any other performance question which goes along the lines of "How do I know if X is slowing my program down?" or "Will Y be faster" and that answer is you break out your profiler and you profile it.

For this particular example you would be interested in whether an unusual amount of time is spent on instructions which access memory. Also if you are using your CPU vendor's profiler (CODEXL for AMD or VTune Pro for Intel) then you can profile by cache misses and see which lines of code and instructions are flushing your cache lines.

You might want to read this article for more.

sjdowling
  • 2,994
  • 2
  • 21
  • 31