Problem
Suppose I have a big array in global memory filled with mostly 0s, but with some elements about (25%) being numbers:
[9, 0, 0, 0, 7, 0, 0, 3, 0, 0, 0, 0, 5, 0, 0, 0, 8, 0, 2, 0, 0, 4, 0, 0, 0, 0, 5]
Is there any efficient kernel that compresses that array by moving numbers > 0 to the left side?
[9, 7, 3, 5, 8, 2, 4, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
What I've tried
I've thought in many potential solutions, usually taking inspiration from binary reduce algorithms, but those don't really apply, because I'm not compressing all numbers into one, but many. A semi-full initial array, for example, wouldn't be changed considerably. A way to do it, of course, would be to move the array to the CPU and apply the obvious sequential algorithm:
var j = 0;
var x = 0;
for (int i = 0; i < len; ++i) {
if (arr[i] !== 0) {
x = arr[j];
arr[j] = arr[i];
arr[j] = x;
}
if (arr[j] > 0) {
j += 1;
}
}
But the overhead may be significant for big enough arrays. I couldn't find any solution that takes less than O(N)
kernel invocations.