5

I am using CUDA/Thrust/CUDPP. As I understand, in Stream compaction, certain items in an array are marked as invalid and then "removed".

Now what does "removal" really mean here? Suppose the original array A and has length 6. If 2 elements are invalid (by whatever condition we may provide) then

  1. Does the system create a new array of size 4 in GPU-memory to store the valid elements to get the final result?

  2. OR does it physically remove the invalid elements from memory and shrink the original array A down to size 4 keeping only the valid elements?

For either case, doesn't that mean that dynamic memory allocation is happening under the hood? But I had heard that dynamic memory allocation is not possible in the CUDA world.

smilingbuddha
  • 14,334
  • 33
  • 112
  • 189
  • 2
    There is another possibility, which is that the size of the memory allocation doesn't change, and the first 4 elements are valid, the last 2 are undefined. But really this question is all about implementation issues, and who is to say that CUDPP or thrust work the same? – talonmies Dec 05 '11 at 18:18
  • ArrayFire is a better/easier option than Thrust and also free, at least for single GPU usage. http://accelereyes.com/arrayfire – arrayfire Dec 14 '11 at 01:40

1 Answers1

4

First, dynamic memory allocation is possible in CUDA on Compute Capability 2.0 and higher devices. The CUDA runtime library supports malloc/free and new/delete in __device__ functions. But that is not germane to the answer, really.

Typically a large-enough output array is provided (pre-allocated, often the same size as the input array) and the output is written to it. No dynamic allocation required, but there is potentially storage waste. This is what CUDPP and thrust do. An alternative would be to perform a count of valid elements first, then allocate the output GPU memory dynamically using cudaMalloc called from the host CPU.

harrism
  • 26,505
  • 2
  • 57
  • 88
  • 1
    That's not what Thrust does :) Compaction algorithms in Thrust (e.g. , ```thrust::copy_if```) typically ask for an output buffer. – Jared Hoberock Dec 06 '11 at 18:51
  • Thanks. Edited my answer. What if the result iterator points to an allocation that is not large enough? Is there any automatic sizing? – harrism Dec 07 '11 at 01:35
  • No; if the iterator points to an insufficient buffer, the behavior is undefined (i.e., it crashes). – Jared Hoberock Dec 07 '11 at 02:54