0

Assuming I have three integer vectors:

  • mainVect of size 8 million element
  • vect1 of size 1.5 million element
  • vect2 of size 1.5 million element

I want to run the following code:

int i,j;
for ( i = 0; i < vect1.size(); i++)
{
    for ( j = 0; j < mainVect.size(); j++)
    {
        if (vect1[i] == mainVect[j])
        {
            vect2[i]++;             
        }
    }
}

It took a very long time without finishing yet...How I can speed up the run, I have multiprocessors. As a try, I've added the following sentence before the previous code (I read that it run in parallel)

#pragma omp parallel for private(i, j) shared( mainVect, vect1, vect2)

But still slow ...

If I divide the for loop into 4 sections; for example, how I can make these for loops run simultaneously such as

for ( i = 0; i < vect1.size()/4; i++)
{

}

for ( i = vect1.size()/4; i < vect1.size()/2; i++)
{

}
.... and so on

Or any other methods ...

P.S.: Windows google cloud machine, n1-standard-4 (4 vCPUs, 15 GB memory) .. CPU utilization only 27% when run the above code.

underscore_d
  • 6,309
  • 3
  • 38
  • 64
userInThisWorld
  • 1,361
  • 4
  • 18
  • 35
  • 2
    how many cores you have is irrelevant if you're not writing code that uses multiple threads. I'm no expert on this syntax but doubt that what you've written tells the compiler how to parallelise your loop, instead only how it should share those variables if you *were* parallelising them. – underscore_d Dec 08 '17 at 14:13
  • 1
    CPU usage only 27% hints towards your parallelization not working... – Fl.pf. Dec 08 '17 at 14:13
  • Can you sort `mainVect` ? – Jarod42 Dec 08 '17 at 14:15
  • For vector operation you could use gpu instead. – Petar Petrov Dec 08 '17 at 14:19
  • @underscore_d I don't know! It's a Google cloud machine. – userInThisWorld Dec 08 '17 at 14:21
  • @Jarod42 No I can't sort it – userInThisWorld Dec 08 '17 at 14:21
  • @PetarPetrov this is not available now – userInThisWorld Dec 08 '17 at 14:22
  • Possible duplicate of [Parallel Loops in C++](https://stackoverflow.com/questions/36246300/parallel-loops-in-c), [Parallel for loop in openmp](https://stackoverflow.com/questions/11773115/parallel-for-loop-in-openmp), etc. Did you search before asking? If so, you should explain why the answers you found did not resolve your problem, so people don't repeat them here again. – underscore_d Dec 08 '17 at 14:22
  • @noor You don't know what? I didn't ask about your machine; I asked whether you were sure you were actually telling the compiler to parallelise, because it doesn't look to me like you are, neither from the syntax used nor from the fact of 27% usage, which is within sampling error of the figure you'd expect if your program was only saturating one of the 4 available threads (25%). – underscore_d Dec 08 '17 at 14:23
  • 2
    I don't think any amount of linear speedup is going to help - you have *twelve thousand billion* iterations. Four cores optimally used would bring the waiting time down to the equivalent of three thousand billion. (At one nanosecond per iteration - which I believe is optimistic - that's from over three hours to just under one.) On the other hand, first counting the elements of `mainVect` and then doing 1.5 million table lookups could possibly cut the time down to a matter of seconds. – molbdnilo Dec 08 '17 at 14:52

2 Answers2

4

8 million ints do not take much space. You may use unordered_map or some other efficient associative containers.

unordered_map<int, int> umap;
for (int v : mainVect) {
    umap[v]++;
}
for (int i = 0; i < vect1.size(); ++i) {
    if (umap.count(vect1[i])) {
        vect2[i] += umap[vect1[i]];
    }
}

This one takes linear time to populate vect2 which is very fast.

abdullah
  • 662
  • 5
  • 7
  • I like this one. It basically uses the same idea as mine, but the use of an associative container is probably a lot faster than sorting. I'm not a C++ expert and don't know what tools are in the box, but I know how to sort :-) Fact is that bot approaches outperform multi-threading easily. – Ronald Dec 08 '17 at 14:39
3

Using threads is one possible solution.

But the main question is: what problem are you trying to solve?

If I understand it correctly, you're counting the number of occurrences in mainVect of an element in vect1. Since you don't need to know where, you can rearrange (a copy of) mainVect.

My approach would be:

  1. Sort mainVect
  2. convert mainVect to a table consisting of "key" and number of occurrences
  3. Sort vect1 and create an indirection vector. Iterating over this indirection vector gives the "key"s in ascending sequence
  4. now you can "merge"

The complexity of this approach is O(n log n)

Ronald
  • 2,842
  • 16
  • 16