0

I'm writing a class in windows using visual studio, one of it's public function has a big for loop looks like below,

void brain_network_opencl::block_filter_fcd_all(int m)
{
  const int m_block_len = m * block_len;
  time_t start, end;
  for (int j = 0; j < shift_2d_gpu[1]; j++) // local work size/number of rows per block
  {
    for (int i = 0; i < masksize; i++)  // number of extracted voxels
    {
        if (j + m_block_len != i)
        {
            //if (floor(dst_ptr_gpu[i + j * masksize] * power_up) > threadhold_fcd)
            if ((int)(dst_ptr_gpu[i + j * masksize] * power_up) > threadhold_fcd)
            {
                org_row = mask_ind[j + m_block_len];
                org_col = mask_ind[i];

                nodes.insert(org_row);
                conns.insert(make_pair(org_row, org_col));
            }

        }
    }
}
end = clock();
cout << end - start << "ms" << " for block" << j << endl;
}

where nodes is std::set<set> ,conns is std::multimap<int, int> and mask_ind is std::vector<int>, they are declared as private variables as well as masksize and shift_2d_gpu;

Major time costs by floor and .insert;

The problem is, the same code (with all the variables) in a main function costs only 1/5~1 the time than it calls from here. And if I replace (int) by floor in both function and main(), it costs much more in this function;

What causes this problem and do I have to write it all inside a main()? By the way does it has something to do with the overloads? floor shows +3 overloads and .insert shows +5 overloads

updates

I copy the codes of this function to another new console project's main function. It's still much slower than my first function (codes also in main)!!! Now I'm confused... It's there any settings that make floor and .insert faster?

updates 2014/03/31

It's because of the settings in Project Properties->Configuration Properties->C/C++->General->Debug Information Format, this value is set to P*rogram Database for Edit And Continue (/ZI)* as default and it is incompatible with a lot of optimizations according to msdn. If this value is set to Program Database (/Zi), the time cost of floor wouldn't be 10 times of (int).

(I looked into Disassembly and found out that the length of codes (call floor -> jmp floor ->different codes) are different when the setting is altered, that's the reason causes floor and .insert spent much more time than it should)

kbxu
  • 127
  • 1
  • 9

1 Answers1

0

As Gassa has pointed out, to optimize the tight loop use a custom floor function.

set<int> isn't cache friendly, but to replace it with a cache-friendly structure you might need to alter the algorithm. Still, unordered_set<int>, with a decent space reserved to it, should be a bit better, having less cache misses per insert than a binary tree.

P.S. Non-virtual overloads in C++ are resolved at compile time and have no effect on performance

Community
  • 1
  • 1
ArtemGr
  • 11,684
  • 3
  • 52
  • 85