5

I am writing a code to compute dot product of two vectors using CUBLAS routine of dot product but it returns the value in host memory. I want to use the dot product for further computation on GPGPU only. How can I make the value reside on GPGPU only and use it for further computations without making an explicit copy from CPU to GPGPU?

talonmies
  • 70,661
  • 34
  • 192
  • 269
user1439690
  • 659
  • 1
  • 11
  • 26

3 Answers3

15

You can do this in CUBLAS as long as you use the "V2" API. The newer API includes a function cublasSetPointerMode which you can use to set the API to assume that all routines which return a scalar value will be passed a device pointer rather than a host pointer. This is discussed in Section 2.4 of the latest CUBLAS documentation. For example:

#include <cuda_runtime.h>
#include <cublas_v2.h>
#include <stdio.h>

int main(void)
{
    const int nvals = 10;
    const size_t sz = sizeof(double) * (size_t)nvals;
    double x[nvals], y[nvals];
    double *x_, *y_, *result_;
    double result=0., resulth=0.;

    for(int i=0; i<nvals; i++) {
        x[i] = y[i] = (double)(i)/(double)(nvals);
        resulth += x[i] * y[i];
    }

    cublasHandle_t h;
    cublasCreate(&h);
    cublasSetPointerMode(h, CUBLAS_POINTER_MODE_DEVICE);
    
    cudaMalloc( (void **)(&x_), sz);
    cudaMalloc( (void **)(&y_), sz);
    cudaMalloc( (void **)(&result_), sizeof(double) );

    cudaMemcpy(x_, x, sz, cudaMemcpyHostToDevice);
    cudaMemcpy(y_, y, sz, cudaMemcpyHostToDevice);

    cublasDdot(h, nvals, x_, 1, y_, 1, result_);

    cudaMemcpy(&result, result_, sizeof(double), cudaMemcpyDeviceToHost);

    printf("%f %f\n", resulth, result);

    cublasDestroy(h);
    return 0;
}

Using CUBLAS_POINTER_MODE_DEVICE makes cublasDdot assume that result_ is a device pointer, and there is no attempt made to copy the result back to the host. Note that this makes routines like dot asynchronous, so you might need to keep on eye on synchronization between device and host.

alfC
  • 14,261
  • 4
  • 67
  • 118
talonmies
  • 70,661
  • 34
  • 192
  • 269
  • @harrism , I just tried this code removing the line ` cublasSetPointerMode(h, CUBLAS_POINTER_MODE_DEVICE);` and still works (gives the correct result and doesn't seg fault). Is that the expected behavior? `CUDA 12, nvcc 12.0, arch 7.5, Quadro RTX 5000`. – alfC Mar 05 '23 at 11:54
5

You can't, exactly, using CUBLAS. As per talonmies' answer, starting with the CUBLAS V2 api (CUDA 4.0) the return value can be a device pointer. Refer to his answer. But if you are using the V1 API it's a single value, so it's pretty trivial to pass it as an argument to a kernel that uses it—you don't need an explicit cudaMemcpy (but there is one implied in order to return a host value).

Starting with the Tesla K20 GPU and CUDA 5, you will be able to call CUBLAS routines from device kernels using CUDA Dynamic Parallelism. This means you would be able to call cublasSdot (for example) from inside a __global__ kernel function, and your result would therefore be returned on the GPU.

harrism
  • 26,505
  • 2
  • 57
  • 88
  • 2
    Mark, that isn't true. Since about CUBLAS 4.0 (or whenever the V2 API was released) the result argument can be a host or device pointer and the call will happily keep the result in device memory. – talonmies Sep 13 '12 at 07:17
  • Thanks for the correction. Could have given me time to edit my answer (huge time zone difference). :) – harrism Sep 14 '12 at 00:21
  • I tried @talonmies anser and it works even if I remove the line `cublasSetPointerMode(h, CUBLAS_POINTER_MODE_DEVICE);` and it still works (gives correct result and doesn't seg fault). is the library magically detecting if the memory is in the GPU or CPU? Is this the expected behavior? I would say that, if it is, it is very convenient because I mostly use `cublasSetPointerMode` to control scalar parameters (`alpha`, `beta`) or the operations and not result or input locations. `CUDA 12, nvcc 12.0, arch 7.5, Quadro RTX 5000`. – alfC Mar 06 '23 at 01:34
0

Set pointer mode to device using cublasSetPointerMode().

From cuBLAS docs:

cublasSetPointerMode()

This function sets the pointer mode used by the cuBLAS library. The default is for the values to be passed by reference on the host.

Example:

cublasHandle_t handle;
cublasCreate(&handle);
cublasSetPointerMode(handle, CUBLAS_POINTER_MODE_DEVICE);  // Make the values be passed by reference on the device.

Warning: cublasSetPointerMode also affects pointers used as input parameters (e.g., alpha for cublasSgemm). You will need to store the parameters on the device or set the pointer mode back to host mode.

Josaph
  • 51
  • 1
  • 3