The NVIDIA CUDA Basic Linear Algebra Subroutines (cuBLAS) library is a GPU-accelerated version of the complete standard BLAS library for use with CUDA capable GPUs.
The CUBLAS library is an implementation of the standard BLAS (Basic Linear Algebra Subprograms) API on top of the NVIDIA CUDA runtime.
Since CUDA 4.0 was released, the library contains implementations of all 152 standard BLAS routines, supporting single precision real and complex arithmetic on all CUDA capable devices, and double precision real and complex arithmetic on those CUDA capable devices with double precision support. The library includes host API bindings for C and Fortran, and CUDA 5.0 introduces a device API for use with CUDA kernels.
The library is shipped in every version of the CUDA toolkit and has a dedicated homepage at http://developer.nvidia.com/cuda/cublas.