Optimizing code with a lookup table

Question

This portion of my code takes too long to run and I was looking for a way to optimize it. I think a lookup table would be the fastest way but I could be wrong. My program has a main for loop and for each iteration in the main for loop, a nested loop goes through 1,233,487 iterations and then goes through the if statements if the conditions are met. The main for loop goes through 898,281 iterations so it must go through 898,281 * 1,233,487 calculations. How would I go about creating a lookup table to optimize these calculations/is there a better way to optimize my code.

for (int i = 0; i < all_neutrinos.size(); i++)
{ //all_neutrinos.size() = 898281
    int MC_count = 0;  //counts per window in the Monte Carlo simulation
    int count = 0; //count per window for real data

    if (cosmic_ray_events.size() == MC_cosmic_ray_events.size()) 
    {
        for (int j = 0; j < cosmic_ray_events.size(); j++) 
        { //cosmic_ray_events.size() = 1233487
            if ((MC_cosmic_ray_events[j][1] >= (all_neutrinos[i][3] - band_width))
             && (MC_cosmic_ray_events[j][1] <= (all_neutrinos[i][3] + band_width)))
            {
                if ((earth_radius * fabs(all_neutrinos[i][2] - MC_cosmic_ray_events[j][0]))
                     <= test_arc_length)
                {
                    MC_count++;
                }
            }   

            if ((cosmic_ray_events[j][7] >= (all_neutrinos[i][3] - band_width))
             && (cosmic_ray_events[j][7] <= (all_neutrinos[i][3] + band_width)))
            {
                if(earth_radius * fabs(all_neutrinos[i][2] - cosmic_ray_events[j][6])
                    <= test_arc_length)
                {
                    count++;
                }
            }
        }
        MCcount_out << i << "     " << MC_count << endl;
        count_out << i << "     " << count << endl;
    }
}

What do you want to look up in your lookup table? Lookup table is only applicable if you have calculations that give the same results every time the program is run, and it looks like you can't say if these calculations are repeatable in advance, can you? BTW if doesn't have to be followed by else statement. You can just delete empty else from your code. — KjMag, Jun 22 '17 at 17:11
I think you should be able to lower complexity by sorting your elements and then use code similar to `std::merge`. You currently have `N * M` whereas you should have something like `N log N + M log M`. — Jarod42, Jun 22 '17 at 17:40
You may look at Bentley-Ottmann algorithm, [possible-interview-question-how-to-find-all-overlapping-intervals](http://stackoverflow.com/questions/4542892/possible-interview-question-how-to-find-all-overlapping-intervals) and also [collision-detection-part-2](https://0fps.net/2015/01/18/collision-detection-part-2/). — Jarod42, Jun 22 '17 at 17:47

score 4 · Accepted Answer · answered Jun 22 '17 at 18:06

First cosmic_raw_events and MC_cosmic_ray_events are utterly unrelated. Make it two loops.

Sort MC_cosmic_ray_events by [1]. Sort cosmic_ray_events by [7]. Sort all_neutrinos by [3].

This doesn't have to be in-place sorting -- you can sort an array of pointers or indexes into them if you want.

Start with a highwater and lowwater index into your cosmic ray events set to 0.

Now, walk over all_neutrinos. For each one, advance highwater until MC_cosmic_ray_events[highwater][1] > all_neutrinos[i][3] + band_width). Then advance lowwater until MC_cosmic_ray_events[lowwater][1] >= all_neutrinos[i][3] - band_width).

On the half-open range j = lowwater upto but not including highwater, run:

if (
  (earth_radius * fabs(all_neutrinos[i][2] - MC_cosmic_ray_events[j][0]))
  <= test_arc_length
) {
  MC_count++;
}

Now repeat until i reaches the end of all_neutrinos.

Then repeat this process, using cosmic_ray_events and [7].

Your code takes O(NM) time. This code takes O(N lg N + M lg M + N * (average bandwidth intersect rate) time. If relatively few pass the bandwidth test, you are going to be insanely faster.

Assuming you get an average of 0.5 intersects per all_neutrinos, this will be on the order of 100000x faster.

score -1 · Answer 2 · answered Jun 22 '17 at 18:37

There is not much to optimize. The counts are really high, and there is not much hard computation going on. There are some obvious optimizations you could do, such as storing (all_neutrinos[i][3] +/- bandwitdth) in local variables before entering the j-loop. You compiler probably already does this, though, but this would certainly improve performance in debug mode.

Have you tried separating the two halves of the j-loop and have two j-loops? as in:

   auto all_neutrinos_2 = all_neutrinos[i][2];
   //... precompute bandwidth limits
   for (int j = 0; j < cosmic_ray_events.size(); j++) 
    { //cosmic_ray_events.size() = 1233487
        auto MC_events = MC_cosmic_ray_events[j][1];
        if ((all_neutrinos_lower <= MC_events) &&(MC_cosmic_ray_events[j][1] <= all_neutrinos_higher))
        {
            if ((earth_radius * fabs(all_neutrinos_2 - MC_cosmic_ray_events[j][0]))
                 <= test_arc_length)
            {
                MC_count++;
            }
        }
    }


    for (int j = 0; j < cosmic_ray_events.size(); j++) 
    { //cosmic_ray_events.size() = 1233487
        auto events = cosmic_ray_events[j][7];
        if ((all_neutrinos_lower <= events) && (events  <= all_neutrinos_higher))
        {
            if(earth_radius * fabs(all_neutrinos_2 - cosmic_ray_events[j][6])
                <= test_arc_length)
            {
                count++;
            }
        }
    }

I have the feeling you could get some improvement from improved memory cache hits this way.

Any improvement beyond that would involve packing the input data to reduce memory cache misses, and would involve modifying the structure and code generating the MC_cosmic_ray_events and cosmic_ray_events arrays

Slicing the counts in severals smaller tasks running on different threads is also a route I would look at seriously at this point. Data access is read only, and each thread can have its own counter, which can all be summed in the end.

*There is not much to optimize* what a nonsense. – Walter Jun 23 '17 at 09:41 — Walter, Jun 23 '17 at 09:41

Optimizing code with a lookup table

2 Answers2