I'm using cub::DeviceScan
functiona and the sample code snippet has a parameter temp_storage_bytes
, which it uses to allocate memory (which, incidentally, the code snippet never frees).
The code snippet calls cub::DeviceScan
functions with a pointer to NULL
memory which triggers it to calculate the required amount of temporary device memory needed for the function, and then returns. The necessary temporary memory is allocated with cudaMalloc
, and the function call is repeated pointing to this memory. The temporary memory is then freed with cudaFree
(or probably should be).
I'm doing many repetitions of the device scan on different float arrays, but each float array is identical length.
My question is, can I assume that temp_storage_bytes
will always be the same value? If so, I can then do a single cudaMalloc
and a single cudaFree
for many function calls.
The example is unclear on how the required memory is determined and whether it can change for a given array of a given length.