Can NPP functions, more concrete npps (https://docs.nvidia.com/cuda/npp/group__npps.html) be called as a device function?
If I create a global function can I inside call npps functions as nppsMaxIndx_32f
(to compute max of a vector)?
Example: I have 100 vectors of 10000 floats each, if I do it in host code I have to make 100 calls to npp function
If I make a global function of 100 threads and inside call the npp function for each vector so they launch simultaneously, will this work? nppsMaxIndx_32f
can be called as a device function?