I'm implementing an algorithm on the GPU using CUDA which is certain to give incorrect results when a specific input buffer (3D float vectors) contains duplicate entries. For this reason I want to do a pre-processing step to remove any duplicates which are present.
Since I know the input data contains a significant number of duplicates, explicitly trimming the buffer can free up much needed memory for some of the processing steps. Since I have a lot of data to work with, I intend to do this in place within the already allocated buffer.
Does CUDA have a mechanism which allows the end of a cudaMalloc()
'd buffer to be trimmed and freed?