I'm writing a class in windows using visual studio, one of it's public function has a big for loop looks like below,
void brain_network_opencl::block_filter_fcd_all(int m)
{
const int m_block_len = m * block_len;
time_t start, end;
for (int j = 0; j < shift_2d_gpu[1]; j++) // local work size/number of rows per block
{
for (int i = 0; i < masksize; i++) // number of extracted voxels
{
if (j + m_block_len != i)
{
//if (floor(dst_ptr_gpu[i + j * masksize] * power_up) > threadhold_fcd)
if ((int)(dst_ptr_gpu[i + j * masksize] * power_up) > threadhold_fcd)
{
org_row = mask_ind[j + m_block_len];
org_col = mask_ind[i];
nodes.insert(org_row);
conns.insert(make_pair(org_row, org_col));
}
}
}
}
end = clock();
cout << end - start << "ms" << " for block" << j << endl;
}
where nodes
is std::set<set>
,conns
is std::multimap<int, int>
and mask_ind
is std::vector<int>
, they are declared as private variables as well as masksize and shift_2d_gpu;
Major time costs by floor
and .insert
;
The problem is, the same code (with all the variables) in a main function costs only 1/5~1 the time than it calls from here. And if I replace (int)
by floor
in both function and main(), it costs much more in this function;
What causes this problem and do I have to write it all inside a main()?
By the way does it has something to do with the overloads
?
floor
shows +3 overloads
and .insert
shows +5 overloads
updates
I copy the codes of this function to another new console project's main function.
It's still much slower than my first function (codes also in main)!!!
Now I'm confused...
It's there any settings that make floor
and .insert
faster?
updates 2014/03/31
It's because of the settings in Project Properties->Configuration Properties->C/C++->General->Debug Information Format, this value is set to P*rogram Database for Edit And Continue (/ZI)* as default and it is incompatible with a lot of optimizations according to msdn. If this value is set to Program Database (/Zi), the time cost of floor
wouldn't be 10 times of (int).
(I looked into Disassembly and found out that the length of codes (call floor
-> jmp floor
->different codes) are different when the setting is altered, that's the reason causes floor
and .insert
spent much more time than it should)