Questions tagged [cublas]

The NVIDIA CUDA Basic Linear Algebra Subroutines (cuBLAS) library is a GPU-accelerated version of the complete standard BLAS library for use with CUDA capable GPUs.

The CUBLAS library is an implementation of the standard BLAS (Basic Linear Algebra Subprograms) API on top of the NVIDIA CUDA runtime.

Since CUDA 4.0 was released, the library contains implementations of all 152 standard BLAS routines, supporting single precision real and complex arithmetic on all CUDA capable devices, and double precision real and complex arithmetic on those CUDA capable devices with double precision support. The library includes host API bindings for C and Fortran, and CUDA 5.0 introduces a device API for use with CUDA kernels.

The library is shipped in every version of the CUDA toolkit and has a dedicated homepage at http://developer.nvidia.com/cuda/cublas.

330 questions

votes

10 answers

Tensorflow crashes with CUBLAS_STATUS_ALLOC_FAILED

I'm running tensorflow-gpu on Windows 10 using a simple MINST neural network program. When it tries to run, it encounters a CUBLAS_STATUS_ALLOC_FAILED error. A google search doesn't turn up anything. I…

asked Dec 13 '16 at 09:38

Axiverse

1,589
3
14
30

votes

2 answers

Clarification of the leading dimension in CUBLAS when transposing

For a matrix A, the documentation only states that the corresponding leading dimension parameter lda refers to the: leading dimension of two-dimensional array used to store the matrix A Thus I presume this is just the number of rows of A given…

c matrix cuda gpgpu cublas

asked May 04 '13 at 17:05

mchen

9,808
17
72
125

votes

1 answer

non-square C-order matrices in cuBLAS ( numba )

I'm trying to use the cuBLAS functions in Anaconda's Numba package and having an issue. I need the input matrices to be in C-order. The output can be in Fortran order. I can run the example script provided with the package, here. The script has two…

python anaconda numba cublas

asked Jul 25 '17 at 15:29

user1554752

votes

7 answers

tensorflow running error with cublas

when I successfully install tensorflow on cluster, I immediately running mnist demo to check if it's going well, but here I came up with a problem. I don't know what is this all about, but it looks like the error is coming from CUDA python3 -m…

gpu tensorflow deep-learning cublas

asked Jul 11 '16 at 09:53

Pengqi Lu

votes

3 answers

Could a CUDA kernel call a cublas function?

I know it sound weird, but here is my scenario: I need to do a matrix-matrix multiplication (A(n*k)*B(k*n)), but I only needs the diagonal elements to be evaluated for the output matrix. I searched cublas library and didn't find any level 2 or 3…

parallel-processing cuda gpu cublas

asked Nov 14 '12 at 00:09

Hailiang Zhang

17,604
23
71
117

votes

2 answers

Simple CUBLAS Matrix Multiplication Example?

I'm looking for a very bare bones matrix multiplication example for CUBLAS that can multiply M times N and place the results in P for the following code, using high-performance GPU operations: float M[500][500], N[500][500], P[500][500]; for(int i =…

cuda gpu matrix-multiplication cublas

asked Oct 03 '11 at 15:08

Chris Redford

16,982
21
89
109

votes

1 answer

First tf.session.run() performs dramatically different from later runs. Why?

Here's an example to clarify what I mean: First session.run(): First run of a TensorFlow session Later session.run(): Later runs of a TensorFlow session I understand TensorFlow is doing some initialization here, but I'd like to know where in the…

cublas cudnn tensorflow tensorflow-xla

asked Jul 12 '17 at 16:49

Armando Montanez

votes

3 answers

ValueError: libcublas.so.*[0-9] not found in the system path

I'm trying to import and use ultralytics library in my Django rest framework project, I use poetry as my dependency manager, I installed ultralytics using poetry add ultralytics and on trying to import the library in my code I recieve this…

django django-rest-framework cublas

asked May 24 '23 at 21:09

Mahmoud Aboelsoud

votes

2 answers

Matrix-vector multiplication in CUDA: benchmarking & performance

I'm updating my question with some new benchmarking results (I also reformulated the question to be more specific and I updated the code)... I implemented a kernel for matrix-vector multiplication in CUDA C following the CUDA C Programming Guide…

cuda gpu gpgpu nvidia cublas

asked Oct 17 '14 at 03:36

Pantelis Sopasakis

1,902
5
26
45

votes

1 answer

Asynchronous cuBLAS calls

I want to make calls to cuBLAS routines asynchronously. Is it possible? If yes, how can I achieve that?

asynchronous cuda cublas

asked Sep 25 '12 at 11:16

user1439690

votes

1 answer

How to transpose a matrix in an optimal way using blas?

I'm doing some calculations, and doing some analysis on the forces and weakness of different BLAS implementations. however I have come across a problem. I'm testing cuBlas, doing linAlg on the GPU would seem like a good idea, but there is one…

c cuda blas cublas

asked Oct 16 '11 at 13:43

Martin Kristiansen

9,875
10
51
83

votes

1 answer

cublasSetVector() vs cudaMemcpy()

I am wondering if there is a difference between: // cumalloc.c - Create a device on the device HOST float * cudamath_vector(const float * h_vector, const int m) { float *d_vector = NULL; cudaError_t cudaStatus; cublasStatus_t cublasStatus; …

cuda cublas

asked Jun 09 '14 at 13:15

Stefan Falk

23,898
50
191
378

votes

5 answers

Equivalent of cudaGetErrorString for cuBLAS?

CUDA runtime has a convenience function cudaGetErrorString(cudaError_t error) that translates an error enum into a readable string. cudaGetErrorString is used in the CUDA_SAFE_CALL(someCudaFunction()) macro that many people use for CUDA error…

cuda gpu nvidia matrix-multiplication cublas

asked Oct 24 '12 at 00:38

solvingPuzzles

8,541
16
69
112

votes

1 answer

Why cuSparse is much slower than cuBlas for sparse matrix multiplication

Recently when I used cuSparse and cuBLAS in CUDA TOOLKIT 6.5 to do sparse matrix multiplication, I find cuSPARSE is much slower than cuBLAS in all cases! In all my experiments, I used cusparseScsrmm in cuSparse and cublasSgemm in cuBLAS. In the…

matrix cuda multiplication sparse-matrix cublas

asked May 08 '15 at 07:55

ROBOT AI

1,217
3
16
27

votes

2 answers

cuBLAS synchronization best practices

I read two posts on Stack Overflow, namely Will the cublas kernel functions automatically be synchronized with the host? and CUDA Dynamic Parallelizm; stream synchronization from device and they recommend the use of some synchronization API, e.g.,…

c cuda cublas

asked Apr 10 '14 at 12:56

Pantelis Sopasakis

1,902
5
26
45

2 3

…

21 22 Next