I have a series of M
single-channel images, each of size NxN
, stored continuously in a device memory array. (N
is not a power of two.) So, the array is of length MxNxN
. I need to find the sum of all pixels for each of these images. So, the output is M
values, one for each image.
I am generating an additional array that holds the image index of every pixel and using this index to reduce_by_key
for each image (segment). This reduce_by_key
seems to be pretty slow, taking more time than everything else I'm doing on these pixels.
Is there a faster way to do this segmented reduction sum, where the segments are all the same size?