Speeding up the processing of large gridded data set - C++

Question

I am reading through a large grid, using two nest for loops - one covering the rows, one the columns. Within the two nested for loops, I have this code:

for(int row=0; row<rows; row++)
{
    for(int col=0; col<cols; col++)
    {
        input = grid[row][col];    // reading in the initial elevation
        if(input != NODATA)
        {
            //perform some calculation
        }else{
            //skip calculation
            // approximately 10 calculated variables/values are set to no data
        }
    }
}

If all cells containing "actionable" data are always larger than my NODATA value (-9999), and the order at which I visit them doesn't necessarily matter, I suppose I could transcribe my grid into a linear vector sorted in decreasing value from positive to negative, use a single for loop to traverse the vector, and step out of the loop when I hit the first of the 300,000 NODATA values. Would there by a more noticeable decrease in processing time using that method?

I don't really possess the skill set to setup and run such an experiment. I'm sure in a year or two I could.

Edit: Up to 30% of my grid cells may have NODATA values, though it is highly variable. Typically I'm reading in, and then performing calculations on, a grid of over 1 million grid cells - so rows/columns of 1000 each, up to 7-10,000 each.

Edit: Is there some sort of quick lazy sort, where I could just throw everything above a certain threshold towards one side of the vector and below a threshold towards another? Possibly while copying to a another empty vector of the known input size? Like fill in the second vector with everything not a NODATA value form the first position, and everything that is from the lst position, and keep track of the last NODATA position fill when the procedure exited? Then I could just end the single for loop at that position.

Maybe. You have to benchmark it to get your answer. See this [related question](http://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array) — user703016, Feb 03 '15 at 02:41
More information is needed to make a reasonable guess at an answer to the question: what are typical values of `rows` and `cols` (or alternatively, what proportion of elements are set to `NODATA`)? I trust you're compiling with at least `-O2` (Windoze `cl /O2`) optimisation? Have you measured the time needed to do "some calculations" vs. "10 calculates values set to no data", and the time the loop takes to say count number of NODATA elements? Given work on all elements, the main issue is whether the cost of branch misprediction is significant in comparison to branch-specific processing. — Tony Delroy, Feb 03 '15 at 02:45
Couldn't you detect the position of the 300000th NODATA value while setting data to grid[][]? If you know the poistion where escaped from for loop, you may reduce code for counting NODATA again. 'Sorting' is rather costy process , its cost is O(n*log n) or higher, so your plan using sort may not reduce time of processing in total. — Fumu 7, Feb 03 '15 at 02:46
You can do the partial sort as your edit suggests - add valid values to the start of the new vector and NODATA ones from then end back. Even though you can keep this sorted fairly easily, I don't think it's going to speed things up if a significant number of cells swap between valid data and NODATA. — Richard Byron, Feb 03 '15 at 06:49

Speeding up the processing of large gridded data set - C++

0 Answers0