I currently have a C++ code which I am porting to CUDA. The cpp code uses vectors for data storage. I am fairly new to CUDA and I understand vectors cannot be directly used with CUDA.
The amount of elements to store is based on the result of some computation (Basically a threshold check. Samples greater than a threshold are stored). I understand dynamic memory allocation using malloc in the kernel is very slow. So one option is to fix the maximum number of elements, allocate mem for them and rewrite the code for arrays in place of vectors. Disadvantages here being wastage of memory since I store anywhere between 0 and 100 elements and of course that I'll have to do a lot rewriting.
The thrust library offers vectors on the device but from what I have read (on the site) people seem to shy away from thrust. Is it a reasonable solution if I include thrust/device_vector.h and thrust/host_vector.h and keep the vectors as they are? What are the disadvantages of using thrust?
Some background info: This code is part of a pipeline whose previous stages are executing in the GPU. And the reason for porting this code to GPU is to have the pipeline operating in real time (hopefully). Parallelization is done on a higher level and I will have this entire cpp code as one kernel which will run for some 800 threads (each of which represents a dispersion measure or DM). As of now, each DM is done sequentially by calling the C++ code each time.