How do I parallelise a for loop, using `omp parallel` or otherwise?

Question

Assuming I have three integer vectors:

mainVect of size 8 million element
vect1 of size 1.5 million element
vect2 of size 1.5 million element

I want to run the following code:

int i,j;
for ( i = 0; i < vect1.size(); i++)
{
    for ( j = 0; j < mainVect.size(); j++)
    {
        if (vect1[i] == mainVect[j])
        {
            vect2[i]++;             
        }
    }
}

It took a very long time without finishing yet...How I can speed up the run, I have multiprocessors. As a try, I've added the following sentence before the previous code (I read that it run in parallel)

#pragma omp parallel for private(i, j) shared( mainVect, vect1, vect2)

But still slow ...

If I divide the for loop into 4 sections; for example, how I can make these for loops run simultaneously such as

for ( i = 0; i < vect1.size()/4; i++)
{

}

for ( i = vect1.size()/4; i < vect1.size()/2; i++)
{

}
.... and so on

Or any other methods ...

P.S.: Windows google cloud machine, n1-standard-4 (4 vCPUs, 15 GB memory) .. CPU utilization only 27% when run the above code.

how many cores you have is irrelevant if you're not writing code that uses multiple threads. I'm no expert on this syntax but doubt that what you've written tells the compiler how to parallelise your loop, instead only how it should share those variables if you *were* parallelising them. — underscore_d, Dec 08 '17 at 14:13
CPU usage only 27% hints towards your parallelization not working... — Fl.pf., Dec 08 '17 at 14:13
Possible duplicate of [Parallel Loops in C++](https://stackoverflow.com/questions/36246300/parallel-loops-in-c), [Parallel for loop in openmp](https://stackoverflow.com/questions/11773115/parallel-for-loop-in-openmp), etc. Did you search before asking? If so, you should explain why the answers you found did not resolve your problem, so people don't repeat them here again. — underscore_d, Dec 08 '17 at 14:22
@noor You don't know what? I didn't ask about your machine; I asked whether you were sure you were actually telling the compiler to parallelise, because it doesn't look to me like you are, neither from the syntax used nor from the fact of 27% usage, which is within sampling error of the figure you'd expect if your program was only saturating one of the 4 available threads (25%). — underscore_d, Dec 08 '17 at 14:23
I don't think any amount of linear speedup is going to help - you have *twelve thousand billion* iterations. Four cores optimally used would bring the waiting time down to the equivalent of three thousand billion. (At one nanosecond per iteration - which I believe is optimistic - that's from over three hours to just under one.) On the other hand, first counting the elements of `mainVect` and then doing 1.5 million table lookups could possibly cut the time down to a matter of seconds. — molbdnilo, Dec 08 '17 at 14:52

score 4 · Answer 1 · answered Dec 08 '17 at 14:31

4

8 million ints do not take much space. You may use unordered_map or some other efficient associative containers.

unordered_map<int, int> umap;
for (int v : mainVect) {
    umap[v]++;
}
for (int i = 0; i < vect1.size(); ++i) {
    if (umap.count(vect1[i])) {
        vect2[i] += umap[vect1[i]];
    }
}

This one takes linear time to populate vect2 which is very fast.

answered Dec 08 '17 at 14:31

abdullah

662
5
7

I like this one. It basically uses the same idea as mine, but the use of an associative container is probably a lot faster than sorting. I'm not a C++ expert and don't know what tools are in the box, but I know how to sort :-) Fact is that bot approaches outperform multi-threading easily. – Ronald Dec 08 '17 at 14:39

score 3 · Accepted Answer · answered Dec 08 '17 at 14:21

3

Using threads is one possible solution.

But the main question is: what problem are you trying to solve?

If I understand it correctly, you're counting the number of occurrences in mainVect of an element in vect1. Since you don't need to know where, you can rearrange (a copy of) mainVect.

My approach would be:

Sort mainVect
convert mainVect to a table consisting of "key" and number of occurrences
Sort vect1 and create an indirection vector. Iterating over this indirection vector gives the "key"s in ascending sequence
now you can "merge"

The complexity of this approach is O(n log n)

answered Dec 08 '17 at 14:21

Ronald

2,842
16
16

I can't sort mainVect – userInThisWorld Dec 08 '17 at 14:23
@noor then take a vector of indexes of mainVect and sort that vector instead – UKMonkey Dec 08 '17 at 14:25
Why not? you don't need to sort it in place. And you can also use an indirection table like @UKmonkey suggests – Ronald Dec 08 '17 at 14:25

How do I parallelise a for loop, using `omp parallel` or otherwise?

2 Answers2