Multithreading a read-only function of a tree data structure.. What do i need to make thread safe

Question

I have a tree data structure. In addition to the normal insert and removal functions I have a COMPUTE function which computes certain values from the nodes present in the tree. The insert and removal functions affect the tree. But the COMPUTE function does not modify the tree. The COMPUTE function has its own internal queues and adds and deletes nodes to these queues. Here I present a simplified pseudocode of the compute function:

class tree{
    vector<int> value;
    tree * children
}

tree::COMPUTE(vector<int> inputValue)
{
    Set currentSet;
    Set nextSet
    currentSet.add(root);

    foreach (node in currentSet)
    {
        if(node has children)
            foreach(childNode of node)
            {
                if(CONDITION_SATISFIED(childNode))
                    add childNode to nextSet;
            }
        else
            output childNode
    }


}

So the COMPUTE function simply recurses through the tree, adds nodes to its own internal set datastructure and does some computations.

Now I want to multithread the COMPUTE function. Since it does not modify the tree, I guess this is easily doable. Each thread gives its own value and gets its own COMPUTE output. The code is written in C++.

My question is this:

If I call it like this:

void * threadRoutine
{
    tree.COMPUTE(thisThreadVal);
}

int main()
{
    tree myTree();

    //Insertion Code here

    call N threads each with different parameters but the same tree.
}

I dont this it is correct. This is because all the threads will call the same member function of the same tree object. So the internal data strcutures like the sets used in the COMPUTE method will be mangled.

I think I should write the compute function outside the tree and not as a member function of the tree. Can somebody tell me if this is right.

btw: In case somebody is curious, the exact tree that I will be using is called a Cover Tree. It is a relatively new data structure used for finding nearest neighbour queries.

Sorry about that. This is just a pseudocode. I dint want to post code since I thought that the pseudocode will be much clearer. — The Flying Dutchman, Aug 27 '12 at 09:02
You are declaring the sets insinde the compute function so they will not be "mangled", but I can't say about the "condition satisfied". — João Augusto, Aug 27 '12 at 09:06
See [Example for boost shared_mutex (multiple reads/one write)](http://stackoverflow.com/q/989795/105929) — Remus Rusanu, Aug 27 '12 at 09:06
actual code is pretty complex. Its a nearest neighbour algorithm. I can still post a much detailed version of the code if it helps. — The Flying Dutchman, Aug 27 '12 at 09:08
Condition Satisfied is pretty simple. Calculate the distance between the two vectors and see if it is within a threshold. It is also a read-only function. — The Flying Dutchman, Aug 27 '12 at 09:08

score 2 · Accepted Answer · answered Aug 27 '12 at 09:06

2

The approach taken will be safe for multiple COMPUTE (what's with the all-caps?) calls as long as the sets used are locals or else in the heap but the local pointer or reference to them is the only such pointer or reference - they will therefore stay out of each others' way.

(It can be thread-safe in other cases, but it gets a lot more complicated proving it).

It will not be safe to do this while another thread is doing the work that adds or deletes; computes are safe to happen along with other computes, but not along with modifications. If that can happen you need some sort of synchronisation, or to consider all the possible operations together and make them all thread-safe.

Whether the function is a member function or outside is completely irrelevant.

answered Aug 27 '12 at 09:06

Jon Hanna

110,372
10
146
251

All sets are completely local. It will just return a set of nodes that are withing a certain distance threshold of the queried node. btw:The caps was just for the emphasis ;) – The Flying Dutchman Aug 27 '12 at 09:10
Then since they're the only data modified (again, this isn't safe while another thread is doing modifications), it's thread-safe. Thread-safety comes down to one thread messing things up for another. Here the only thing shared by more than one thread isn't being changed, so there's no opportunity for them to mess each other about. – Jon Hanna Aug 27 '12 at 09:13
ya.. my main question was whether the local data structures will be shared between the threads if the function is made a member function. Thanks for the answer. – The Flying Dutchman Aug 27 '12 at 09:16
1

No, a level of abstraction lower, and the only difference with member functions is that there's a `this` pointer as the first parameter, that's hidden from you in C++. Apart from that, at that level they're exactly the same, so either way locals only get shared if you put a reference or pointer somewhere other threads can see it (strictly, they're almost always shared with most memory models, it's just that the other threads have no possible way of reaching it because they've no pointers to it). – Jon Hanna Aug 27 '12 at 09:29

score -1 · Answer 2 · answered Aug 27 '12 at 09:11

-1

After

if(node has children)

Just create a new thread. Add a mutex to protect nextSet

answered Aug 27 '12 at 09:11

Ed Heal

59,252
17
87
127

- add a mutes to protect nextSet -> Does this mean that nextSet will be shared among all the threads? If not, what is the use of the mutex? – The Flying Dutchman Aug 27 '12 at 09:13
nextset will be shared. Just to protect the data structure when used. – Ed Heal Aug 27 '12 at 09:21

Multithreading a read-only function of a tree data structure.. What do i need to make thread safe

2 Answers2