0

I have an error, i don't understand why collapse doesn't work in my code.

    #pragma omp parallel num_threads(IntervalMapEstimator::m_num_thread)
    {
        std::vector<Point3D> local_relevant_points;

        #pragma omp for collapse(2)
        for(int i = first_list_index; i < last_list_index ; i++)
        {
            for (int k = 0; k < pointcloud_ff_polar_angle_lists[i].size(); k++)
            {
                if ( pointcloud_ff_polar_angle_lists[i][k].pol_sensor_rot.phi >= cell_start_angle && pointcloud_ff_polar_angle_lists[i][k].pol_sensor_rot.phi <= cell_end_angle )
                {
                    #pragma omp critical
                    {
                        relevant_points.push_back(pointcloud_ff_polar_angle_lists[i][k]);
                    }
                }
            }
        }
    }

It doesn't work with the collapse, but it does when i'm remove the collapse. I can't figure out why, does somebody see why ? It also work in single thread.

tony497
  • 360
  • 1
  • 3
  • 15
  • Even though the answer may be obvious, your question needs to be improved. *"I have an error ... it doesn't work"* is not a valid problem description. Provide a [mcve] and a clear description of the issue! – Zulan May 23 '18 at 23:58

1 Answers1

4

Regarding the collapse statement, a nice explanation is provided here: Understanding the collapse clause in openmp. As stated in the answer, the collapse clause will only work when the inner loop does not depend on the outer, which in your case does not appear to hold as the size of the vector is not guaranteed to be the same for all values of i.

Unfortunately I do not have the reputation to post comments yet, so I will resort to posting this here - even though it is not exactly related to the original question.

You seem to be parallelizing a loop with fairly simple instructions, it is very likely that the threads will spend a considerable amount of time waiting around that critical region. Although I assume that is what local_relevant_points will be used for.

Qubit
  • 1,175
  • 9
  • 18
  • Thanks for your answer, I just saw the link you posted. I understand know why it crashed, but do youknow if there any work around possible ? – tony497 May 22 '18 at 13:21
  • What exactly do you want to achieve? – Qubit May 22 '18 at 13:29
  • By the way, you're right with the time consuming method of the critical section. I usually pushback in a local vector and after the for loop do a critical section where i insert all the local data in the relevant_points vector. But i wanted my code to be clearer for the post. – tony497 May 22 '18 at 13:29
  • In the end I want my code to be faster. In general the first loop is only on 3 index and the second is greater more like 20 to 50 values. – tony497 May 22 '18 at 13:33
  • 1
    That's a total of ~150 values you are iterating over, given the expressions are fairly simple this should take no time on a single thread. However, if you really want to do it, I suppose what is best depends on the number of threads as well, if you have 3 and the inner loops are fairly uniform, then parallelize that, if you have many more, you can parallelize both loops, but in this case I worry that managing the threads takes about as much time as solving the problem on a single thread. – Qubit May 22 '18 at 13:37
  • Ok, i will depend on the values i will get then. In my case if think i will let it like that, it make more sense because i'm not sure which value will be sent. Thanks for you help. – tony497 May 22 '18 at 13:43