How to implement properly an inline function in the device that returns a vector to another device function?

Question

I want to implement properly an inlined device function that fill out a vector of dynamic size and return the filled vector like:

__device__  inline   thrust::device_vector<double> make_array(double zeta, int l)
{
  thrust::device_vector<double> ret;
  int N =(int)(5*l+zeta); //the size of the array  will depend on l and zeta, in a complex way...
  // Make sure of sufficient memory allocation
  ret.reserve(N);
  // Resize array
  ret.resize(N);
  //fill it:
  //for(int i=0;i<N;i++)
  // ...;
  return ret;
}

My goal is to use the content of the returned vector in another device function like:

__device__  inline double use_array(double zeta,int l)
{
  thrust::device_vector<double> array = make_array(zeta, l);

  double result = 0;

  for(int i=0; i<array.size(); i++)
    result += array[i];

  return result;
}

How can I do it properly? my feeling is that a thrust vector is designed for this type of task, but I want to do it properly. What is the standard CUDA approach to this task?

score 4 · Accepted Answer · edited May 23 '17 at 12:04

4

thrust::device_vector is not usable in device code.

However you can return a pointer to a dynamically allocated area, like so:

#include <assert.h>

template <typename T>
__device__  T* make_array(T zeta, int l)
{
  int N =(int)(5*l+zeta); //the size of the array  will depend on l and zeta, in a complex way...
  T *ret = (T *)malloc(N*sizeof(T));
  assert(ret != NULL);  // error checking

  //fill it:
  //for(int i=0;i<N;i++)
  // ret[i] = ...;
  return ret;
}

The inline keyword should not be necessary. The compiler will aggressively inline functions wherever possible.

edited May 23 '17 at 12:04

Community

1
1

answered Dec 17 '15 at 05:46

Robert Crovella

143,785
11
213
257

Is possible to free the memory allocated for the array, ret, from another device function once the computation is ready? – user3116936 Dec 17 '15 at 06:18
yes, as long as you are appropriately careful in usage, you can use in-kernel `free`. This is covered in the [documentation](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#dynamic-global-memory-allocation-and-operations). – Robert Crovella Dec 17 '15 at 06:47
Thanks a lot for your advice. – user3116936 Dec 17 '15 at 16:58

How to implement properly an inline function in the device that returns a vector to another device function?

1 Answers1