1

My code heavily rely on std algorithm. Is it possible to use the data of a std::map as inputs to a CUDA kernel through some interface? For example, with a

std::map<int, vector<float>> 

is it possible to do search in the map with GPU instead doing search in the host.

David
  • 325
  • 2
  • 12
  • 3
    generally, no. The limitation is indicated [here](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#standard-library). One of the reasons for this is that these standard libraries often depend on functions compiled for the host, i.e. compiled libraries for x86 processors, which the GPU is not. You might be able to implement something yourself. For example, a reference is made to a map function for device code [here](https://devtalk.nvidia.com/default/topic/523766/cuda-programming-and-performance/std-map-in-device-code/). I can't vouch for it. You might find other examples – Robert Crovella Jul 25 '19 at 20:33
  • 2
    With all container classes that take a `std::allocator` it's not too difficult to pass in a custom allocator that allocates managed memory. However, as the allocator is part of the template, that means modifying all code using the container. And as Robert Crovella has already pointed out, this only gets the raw data onto the device, while all methods of the container class are still only callable from the host. I've used this mainly for transitional versions when porting existing code to the GPU. – tera Jul 25 '19 at 21:15
  • 1
    also [this](https://stackoverflow.com/questions/49147565/how-to-search-the-value-from-a-stdmap-when-i-use-cuda) Since you mention "search in the map" your question is arguably a duplicate of that one. – Robert Crovella Jul 25 '19 at 21:25

1 Answers1

2

tl;dr: You can't do this, and it wouldn't help you if you could.

The code of most standard-library containers is CPU-specific - and none of them have non-CPU-specific parts marked with __host__ __device__ and compiled to be usable in kernels (And that's also the case for the <algorithm> code). So, technically, no. (Caveat: Things will be a bit more complicated in C++20 with ubiquitous constexpring.)

Also, most of these containers are not designed with parallel or concurrent execution in mind: Adding or removing elements to an std::vector or an std::map by two non-serialized CPU or GPU threads will most likely result in data corruption and possibly even worse. So, you don't want to do that even on the CPU.

Another point to remember is memory allocation which is done differently on a GPU and on a CPU; and that mostly you want to avoid dynamic memory allocation within GPU kernels.

But, you asked, what about using the raw data of a map-of-vectors rather than the code?

Well, if you have a map-of-vectors data structure in main system memory, you will not get any speedup from using a GPU to search it. More generally, it is unlikely you will speed up searches of main-memory structures using a discrete GPU: On common hardware platforms, the CPU offers you higher bandwidth and lower latency for main memory access than the GPU, and search is typically about sporadic non-consecutive memory accesses, so your hopes will be frustrated.

einpoklum
  • 118,144
  • 57
  • 340
  • 684