I have mpi program to distribute the large array amongst several processes on cluster.
Each process calculates its own sum of array elements and returns the result to host.
I want to run parallel prefix scan on array elements of each process.
Any idea whether it is possible with CUDPP. ?
Has anyone used openmpi and cudpp together?