I am wondering if anyone could suggest the best approach to computing the mean / standard deviation of a large number of relatively small but differently sized arrays in CUDA?
The parallel reduction example in the SDK works on a single very large array and it seems the size is conveniently a multiple of the number of threads per block, but my case is rather different:
Conceptually, I however have a large number of objects which each contain two components, upper
and lower
and each of these components has an x
and a y
coordinate. i.e.
upper.x, lower.x, upper.y, lower.y
Each of these arrays is approximately 800
in length but it varies between objects (not within an object) e.g.
Object1.lower.x = 1.1, 2.2, 3.3
Object1.lower.y = 4.4, 5.5, 6.6
Object1.upper.x = 7.7, 8.8, 9.9
Object1.upper.y = 1.1, 2.2, 3.3
Object2.lower.x = 1.0, 2.0, 3.0, 4.0, 5.0
Object2.lower.y = 6.0, 7.0, 8.0, 9.0, 10.0
Object2.upper.x = 11.0, 12.0, 13.0, 14.0, 15.0
Object2.upper.y = 16.0, 17.0, 18.0, 19.0, 20.0
Please note the above is just my way of representing the array and my data is not stored in C
structs or anything like that: the data can be organised in whatever way I need. The point is that, for each array, the mean, standard deviation and eventually a histogram needs to be computed and within one particular object, ratios and differences between arrays need to be computed.
How should I go about sending this data to the GPU device and organising my thread-block hierarchy? One idea I had was to zero pad all my arrays so they are of the same length, and have a group of blocks working on each object, but it seems there are all sorts of problems with that method if it would work at all.
Thanks in advance