0

I am having some trouble with creating a functor properly to access a device vector. Basically, I have two device vectors that I would like to use inside a functor. The functor is called during a for_each.

Here is my functor:

struct likelihood_functor
{
int N;
float* v1;
float* v2;

likelihood_functor(int _N, float* _v1, float* _v2) : N(_N),v1(_v1),v2(_v2) {}
template <typename Tuple>

__host__ __device__ void operator()(Tuple t)
{
    float A = thrust::get<0>(t);
    float rho = thrust::get<1>(t);
    float mux = thrust::get<2>(t);
    float muy = thrust::get<3>(t);
    float sigx = thrust::get<4>(t);
    float sigy = thrust::get<5>(t);

    thrust::device_ptr<float> v1_p(v1);
    thrust::device_vector<float> X(v1_p,v1_p+N);
            thrust::device_ptr<float> v2_p(v2);
    thrust::device_vector<float> Y(v2_p,v2_p+N);

    thrust::get<6>(t) = 600*logf(A)
        - 600/2*logf(sigx*sigx*sigy*sigy*(1-rho*rho))
        - thrust::reduce(X.begin(),X.end())
        - thrust::reduce(Y.begin(),Y.end())
        - 2*rho/(sigx*sigy);
}
};

And here is my main():

int main(void)
{

// create a 2D dataset
const int N=2500; //number of counts

thrust::device_vector<float> data_x(N);
thrust::device_vector<float> data_y(N);

thrust::counting_iterator<unsigned int> begin(0);
    thrust::transform(begin,
        begin + N,
        data_x.begin(),
        get_normal(5.f,1.f,2.f));
thrust::transform(begin,
        begin + N,
        data_y.begin(),
        get_normal(5.f,1.f,2.f));

    //
    // Some code here to initiate A_n, rho_na, mux_n etc...
    //

// apply the transformation
thrust::for_each(
    thrust::make_zip_iterator( 
      thrust::make_tuple(A_n.begin(), rho_n.begin(), mux_n.begin(),  muy_n.begin(), sigx_n.begin(),sigy_n.begin(), L.begin()) 
    ), 
    thrust::make_zip_iterator( 
      thrust::make_tuple(A_n.end(), rho_n.end(), mux_n.end(), muy_n.end(), sigx_n.end(),sigy_n.end(),L.end())
    ), 
    likelihood_functor(N,thrust::raw_pointer_cast(&(data_x[0])),thrust::raw_pointer_cast(&(data_y[0])))
    );

// print the output
for(int i=0; i<4096; i++)
{
    std::cout << "[" << i << "] : " << L[i] <<std::endl;
}

}

The code compile, but it does not run. I know it is because in my functor, the device_vector X and Y are not done properly.

I have used the same code to create X and Y in my main function, and when I do this, the program runs fine (in this case, I do not call the functor). What is different from inside a functor that would make something work in the main program and not in the functor?

Is there another way of doing what I am trying to do?

Thank you for the help!

azhew
  • 45
  • 1
  • 4

1 Answers1

1

EDIT: The answer below is no longer correct. Thrust algorithms are usable in a variety of ways in CUDA device code, some of which are covered here. thrust::device_vector is still generally not usable in CUDA device code (although the underlying data is).


Thrust algorithms (e.g. transformations, reductions, etc.) cannot be used in cuda device code.

That is any function preceded by __global__ or __device__ cannot use thrust (e.g. cannot use for example thrust::reduce).

Therefore your functor will not work in device code, since it uses thrust constructs (such as thrust::reduce).

I realize you say that your code compiles, but I'm not really sure I believe that. If I try to declare a thrust::device_vector inside __device__ code, I get compile errors. Since the code you've shown here is incomplete in many ways, I'm not able to easily demonstrate that with your code, due to other problems with what you've posted.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • Thank you for your answer. I am a little confused though: I have seen examples of functors called in a for_each where "thrust::get" was used, and they were __device__ functions. (this is actually why I did this in my function: found it directly in the thrust quick start guide or something...). If I cannot use thrust in a __device__ function, then I should not be able to use thrust::get, but I can... Now, if it really is impossible to access my datasets inside a device function, what would be a good way around it? Thanks a lot. – azhew Nov 23 '13 at 04:51
  • You can use `thrust::get`. You can't use `thrust::reduce`. A functor in thrust is called on a per-thread basis. Therefore you want to recast your algorithm to make the processing done by the functor something that is sensible to do in a single thread. You can certainly recast your usage of `thrust::reduce` as a simple for-loop executed by a single thread. Performance may be a challenge. And nowhere did I say it's impossible to access your data sets within a device function. You just can't use thrust algorithms. – Robert Crovella Nov 23 '13 at 05:48
  • Ah, I understand it better now, thanks a lot. Indeed changing my algorithm seems like the right approach. I can do my reductions before calling the functor. I will do that. Thanks again! – azhew Nov 23 '13 at 05:51