Segmentation Fault in Pycuda using NVIDIA's cuSolver Library

Question

i'm tryin to make a pycuda wrapper inspired by scikits-cuda library, for some operations provided in the new cuSolver library of Nvidia, first I need to perfom an LU factorization through cusolverDnSgetrf() op. but before that I need the 'Workspace' argument, the tool that cuSolver provides to get that is named cusolverDnSgetrf_bufferSize(); but when I use it, just crash and return a segmentation-fault. What I'm doing wrong?

Note: I have already working this op with scikits-cuda but the cuSolver library use a lot this kind of argument and I want to compare the usage between scikits-cuda and my implementation with the new library.

import numpy as np
import pycuda.gpuarray
import ctypes
import ctypes.util

libcusolver = ctypes.cdll.LoadLibrary('libcusolver.so')

class _types:
  handle = ctypes.c_void_p

libcusolver.cusolverDnCreate.restype = int
libcusolver.cusolverDnCreate.argtypes = [_types.handle]

def cusolverCreate():
    handle = _types.handle()
    libcusolver.cusolverDnCreate(ctypes.byref(handle))
    return handle.value

libcusolver.cusolverDnDestroy.restype = int
libcusolver.cusolverDnDestroy.argtypes = [_types.handle]

def cusolverDestroy(handle):
    libcusolver.cusolverDnDestroy(handle)


libcusolver.cusolverDnSgetrf_bufferSize.restype = int
libcusolver.cusolverDnSgetrf_bufferSize.argtypes =[_types.handle,
                                       ctypes.c_int,
                                       ctypes.c_int,
                                       ctypes.c_void_p,
                                       ctypes.c_int,
                                       ctypes.c_void_p]

def cusolverLUFactorization(handle, matrix):
    m,n=matrix.shape
    mtx_gpu = gpuarray.to_gpu(matrix.astype('float32'))
    work=gpuarray.zeros(1, np.float32)
    status=libcusolver.cusolverDnSgetrf_bufferSize(
                          handle, m, n,
                          int(mtx_gpu.gpudata),
                          n, int(work.gpudata))
    print status


x = np.asarray(np.random.rand(3, 3), np.float32)
handle_solver=cusolverCreate()
cusolverLUFactorization(handle_solver,x)
cusolverDestroy(handle_solver)

The handle type you have defined is wrong. It should not be a pointer to void. That makes no sense — talonmies, Apr 21 '15 at 16:45
from [cuSolver documentation:](http://docs.nvidia.com/cuda/cusolver/index.html#cuSolverDNhandle) > This is a pointer type to an opaque cuSolverDN context, which the user must initialize by calling cusolverDnCreate() prior to calling any other library function. [Scikits-cuda](http://scikit-cuda.readthedocs.org/en/latest/_modules/scikits/cuda/cublas.html#cublasCreate) use the same definition as I'm using but with the CUBLAS library, I hardcoded too for CUBLAS and works smoothly — Miguel Diaz, Apr 21 '15 at 17:09

score 2 · Accepted Answer · answered Apr 21 '15 at 21:07

2

The last parameter of cusolverDnSgetrf_bufferSize should be a regular pointer, not a GPU memory pointer. Try modifying the cusolverLUFactorization() function as follows:

def cusolverLUFactorization(handle, matrix):
    m,n=matrix.shape
    mtx_gpu = gpuarray.to_gpu(matrix.astype('float32'))

    work = ctypes.c_int()
    status = libcusolver.cusolverDnSgetrf_bufferSize(
                         handle, m, n,
                         int(mtx_gpu.gpudata),
                         n, ctypes.pointer(work))
    print status
    print work.value

answered Apr 21 '15 at 21:07

lebedov

1,371
2
12
27

good catch. And that regular pointer is used because it should be pointing to a host variable (`Lwork`) that will contain the *size* of the temporary workspace needed by a subsequent call to `cusolverDnSgetrf`, so make sure that the [`Workspace` pointer](http://docs.nvidia.com/cuda/cusolver/index.html#cuds-lt-t-gt-getrf) in the subsequent call points to a device allocated space of `Lwork` size. – Robert Crovella Apr 22 '15 at 04:23
ja! thank you very much lebedov you are the man! and @Robert Crovella that's you're pointing out it's totally right too, I'll edit the lebedov answer to contain the next step in the process flow that is to perform actually the LU factorization if someone there needs it. – Miguel Diaz Apr 22 '15 at 15:27
@MiguelDiaz: might be preferable to just link to the [answer](http://stackoverflow.com/questions/29780180/getrs-function-of-cusolver-over-pycuda-doesnt-work-properly) I posted in response to your other question. – lebedov Apr 22 '15 at 16:41

Segmentation Fault in Pycuda using NVIDIA's cuSolver Library

1 Answers1