I have a vector of 10M float. I want to know sum of every 100 elements, so 10000 sums in total. What is the fastest way to do this?
Asked
Active
Viewed 159 times
1
-
If you can use `OpenCL` 2, then I recommend @huseyin's answer [here](https://stackoverflow.com/questions/46861492/what-is-the-optimum-opencl-2-kernel-to-sum-floats) – kenba Dec 01 '17 at 07:29
1 Answers
1
I'd recommend using reduce_by_key
algorithm, fancy iterators and Boost.Compute lambda expr. Every 100 elements are marked with the same key and reduced. I'm not sure if you can replace keys_output
with a discard_iterator
to save some performance.
boost::compute::vector<int> keys_output(values_input.size()/100, context);
boost::compute::vector<int> values_output(values_input.size()/100, context);
boost::compute::reduce_by_key(
boost::compute::make_transform_iterator(
boost::compute::make_counting_iterator<int>(0),
boost::compute::_1 / 100
),
boost::compute::make_transform_iterator(
boost::compute::make_counting_iterator<int>(values_input.size()),
boost::compute::_1 / 100
),
values_input.begin(),
keys_output.begin(),
values_output.begin(),
queue
);

haahh
- 321
- 1
- 2
-
For sure it's not as fast as hand-written OpenCL code. This is a special case. On most GPUs you will be able to sum each 100 elements from vector in one or two steps (kernels). – haahh Dec 01 '17 at 11:15