-1

Is there any way to find the rank of element in a matrix row separately using CUDA or any functions for the same provided by NVidia?

2 Answers2

1

I don't know of a built-in ranking or argsort function in CUDA or any of the libraries I am familiar with.

You could certainly build such a function out of lower-level operations using thrust for example.

Here is a (non-optimized) outline of a possible solution approach using thrust:

$ cat t84.cu
#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include <thrust/sort.h>
#include <thrust/sequence.h>
#include <thrust/functional.h>
#include <thrust/adjacent_difference.h>
#include <thrust/transform.h>
#include <thrust/iterator/permutation_iterator.h>
#include <iostream>

typedef int mytype;

struct clamp
{
  template <typename T>
  __host__ __device__
  T operator()(T data){
    if (data == 0) return 0;
    return 1;}
};

int main(){

  mytype data[]  = {4,1,7,1};
  int dsize = sizeof(data)/sizeof(data[0]);
  thrust::device_vector<mytype> d_data(data, data+dsize);
  thrust::device_vector<int> d_idx(dsize);
  thrust::device_vector<int> d_result(dsize);

  thrust::sequence(d_idx.begin(), d_idx.end());

  thrust::sort_by_key(d_data.begin(), d_data.end(), d_idx.begin(), thrust::less<mytype>());
  thrust::device_vector<int> d_diff(dsize);
  thrust::adjacent_difference(d_data.begin(), d_data.end(), d_diff.begin());
  d_diff[0] = 0;
  thrust::transform(d_diff.begin(), d_diff.end(), d_diff.begin(), clamp());
  thrust::inclusive_scan(d_diff.begin(), d_diff.end(), d_diff.begin());

  thrust::copy(d_diff.begin(), d_diff.end(), thrust::make_permutation_iterator(d_result.begin(), d_idx.begin()));
  thrust::copy(d_result.begin(), d_result.end(), std::ostream_iterator<int>(std::cout, ","));
  std::cout << std::endl;
}

$ nvcc -arch=sm_61 -o t84 t84.cu
$ ./t84
1,0,2,0,
$
Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • thank you. Why is it non-optimized? If I am not wrong, your solution is based on vector. Since I want to perform the above task in a matrix row, does your solution work for this case? Can I use it in pyCUDA? – Rasmi Ranjan Khansama Feb 07 '17 at 04:07
  • It is non-optimized because I haven't thought about all the different ways to create such a function, so I imagine there are more optimal ways. Even with what is shown, there may be clever uses of thrust fusion to improve performance. The method outlined is an attempt to show how the row-ranking function could be implemented, as a concept sketch. If you want to extend it to work on matrix rows all at once, I imagine it could be done, as thrust operations can be extended that way (look at the thrust examples). Regarding pyCUDA, if you google "thrust pycuda" you'll find interop examples. – Robert Crovella Feb 07 '17 at 04:21
-1

If you are in CUDA, the concept rank is not the same as the one on other languages as openmp or mpi. On that case you will need to go on a global block of the code you need to work with threadIdx.x and blockIdx.x parameters