False sharing and stack variables

Question

I have small but frequently used function objects. Each thread gets its own copy. Everything is allocated statically. Copies don't share any global or static data. Do I need to protect this objects from false sharing?

Thank you. EDIT: Here is a toy program which uses Boost.Threads. Can false sharing occur for the field data?

#include <boost/thread/thread.hpp>

struct Work {
    void operator()() {
        ++data;
    }

    int data;
};

int main() {
    boost::thread_group threads;
    for (int i = 0; i < 10; ++i)
        threads.create_thread(Work());
    threads.join_all();
}

Code would work better. If your function objects have `static` data, then all the threads will share that data. — GManNickG, Jul 26 '10 at 06:58
Think you need to tell exactly what you mean by "each thread gets it's own copy" and "allocated statically". Do threads use each others copy? — Elemental, Jul 26 '10 at 07:33
@Elemental: Some compilers can use TLS- thread local storage. This means that you can allocate statically AND thread-safely, although such is slow. — Puppy, Jul 26 '10 at 07:38

Christopher · Accepted Answer · 2010-07-26T10:24:47.507

6

False sharing between threads is when 2 or more threads use the same cache line.

E.g. :

struct Work {
    Work( int& d) : data( d ) {}
    void operator()() {
        ++data;
    }

    int& data;
};

int main() {
    int false_sharing[10] = { 0 };
    boost::thread_group threads;
    for (int i = 0; i < 10; ++i)
        threads.create_thread(Work(false_sharing[i]));
    threads.join_all();

    int no_false_sharing[10 * CACHELINE_SIZE_INTS] = { 0 };
    for (int i = 0; i < 10; ++i)
        threads.create_thread(Work(no_false_sharing[i * CACHELINE_SIZE_INTS]));
    threads.join_all();
}

The threads in the first block do suffer from false sharing. The threads in the second block do not (thanks to CACHELINE_SIZE).

Data on the stack is always 'far' away from other threads. (E.g. under windows, at least a couple of pages).

With your definition of a function object, false sharing can appear, because the instances of Work get created on the heap and this heap space is used inside the thread.

This may lead to several Work instances to be adjacent and so may incur sharing of cache lines.

But ... your sample does not make sense, because data is never touched outside and so false sharing is induced needlessly.

The easiest way, to prevent problems like this, is to copy your 'shared' data locally on tho the stack, and then work on the stack copy. When your work is finished copy it back to the output var.

E.g:

struct Work {
    Work( int& d) : data( d ) {}
    void operator()()
    {
        int tmp = data;
        for( int i = 0; i < lengthy_op; ++i )
           ++tmp;
        data = tmp;
    }

    int& data;
};

This prevents all problems with sharing.

edited Jul 26 '10 at 10:24

answered Jul 26 '10 at 09:55

Christopher

8,912
3
33
38

Are you saying that data can be affected by false sharing? In my case copying it into the function's stack won't help, because the function itself must be called frequently and uses data only once per call. – user401947 Jul 26 '10 at 11:21
When the function must be called very frequently, it doesn't make sense to create a thread every time. Either you do much work in a new thread, or you just burn cycles for thread creation/destruction. And in the later case, you overshadow the cost of false sharing by the enormous costs of the threads. – Christopher Jul 26 '10 at 11:47
Nevertheless. If you cannot copy data onto the stack for your operation, then just make 'Work' large enough to be at least CACHLINE_SIZE long. You lose a couple of bytes, but you can really be sure to never run into false sharing problems. – Christopher Jul 26 '10 at 11:48
Each thread calls that function many-many times before the work is done. I didn't show any loops for brevity. Each function can be executed independently.That is why I want to use multi-threading. – user401947 Jul 26 '10 at 11:56
Then the answer degrades even more to 'it depends'. If you control the allocation of the function objects, then you can easily prevent them from using the same cache line. If you cannot control it, you should pass the 'data' fragments as references and control the allocation on your side. If you cannot do that, then enlarge the structure with 'unused' bytes, to enforce cache-line separation. If you don't know, try out one of these schemes. – Christopher Jul 26 '10 at 12:37

score 2 · Answer 2 · answered Jul 26 '10 at 22:55

I did a fair bit of research and it seems there is no silver bullet solution to false sharing. Here is what I come up with (thanks to Christopher): 1) Pad your data from both sides with unused or less frequently used stuff. 2) Copy your data into stack and copy it back after all hard work is done. 3) Use cache aligned memory allocation.

score 0 · Answer 3 · answered Jul 26 '10 at 09:53

0

I' don't feel entirely safe with the details, but here's my take:

(1) Your simplified example is broken since boost create_thread expects a reference, you pass a temporary.

(2) if you'd use vector<Work> with one item fro each thread, or othrwise have them in memory sequentially, false sharing will occur.

answered Jul 26 '10 at 09:53

peterchen

40,917
20
104
186

(1) No, it isn't broken. create_thread accepts its argument by value. Check the declaration if you don't believe me. (2) I clearly stated that each thread get's its own copy. Check the code. The function object is passed by value. – user401947 Jul 26 '10 at 11:11
It is. Work is not copied into the target stack. It is 'newed' in the context of 'create_thread' and only a (shared-)pointer is transfered onto the target stack. There the data is only referenced by a pointer. (I tested this, by assigning the thread_id to the data member, and then looking at the value in the operator() call.) – Christopher Jul 26 '10 at 11:51
(1) we are talking about: `thread* create_thread(const boost::function0& threadfunc);` ? That's what I found when trying to checkthe reference – peterchen Jul 26 '10 at 14:45
It is declared as template thread* create_thread(F threadfunc); Which version of boost are you talking about? – user401947 Jul 26 '10 at 20:49
boost docs for 1.32 would spit out only the creator as posted above, the soruce code shows your prototype. strange. As the new thread object with its threadinfo is heap-allocated, there is a decent chance. of them to end up in one cache line. Just modify yur example to spit out the adresses of &data. – peterchen Jul 27 '10 at 05:33

False sharing and stack variables

3 Answers3