all tutorials and introductional material for GPGPU/Cuda often use flat arrays, however I'm trying to port a piece of code which uses somewhat more sophisticated objects compared to an array.
I have a 3-dimensional std::vector whose data I want to have on the GPU. Which strategies are there to get this on the GPU?
I can think of 1 for now:
- copy the vector's data on the host to a more simplistic structure like an array. However this seems wasteful because 1) I have to copy data and then send to the GPU; and 2) I have to allocate a 3-dimensional array whose dimensions are the max of the the element count in any of the vectors e.g. using a 2D vector
imagine {{1, 2, 3, 4, .. 1000}, {1}}, In the host memory these are roughly ~1001 allocated items, whereas if I were to copy this to a 2 dimensional array, I would have to allocate 1000*1000 elements.
Are there better strategies?