0

I'm performing QR decomposition in two different ways: using standard numpy method and using GEQRF LAPACK function implemented in CULA library. Here is simple example in python (PyCULA used to access CULA):

from PyCULA.cula import culaInitialize,culaShutdown
from PyCULA.cula import gpu_geqrf, gpu_orgqr

import numpy as np
import sys

def test_numpy(A):
    Q, R = np.linalg.qr(A)
    print "Q"
    print Q
    print "R"
    print R
    print "transpose(Q)*Q"
    print np.dot(np.transpose(Q), Q)
    print "Q*R"
    print np.dot(Q,R)

def test_cula(A):
    culaInitialize()
    QR, TAU = gpu_geqrf(A)
    R = np.triu(QR)
    Q = gpu_orgqr(QR, A.shape[0], TAU)
    culaShutdown()
    print "Q"
    print Q
    print "R"
    print R
    print "transpose(Q)*Q"
    print np.dot(np.transpose(Q), Q)
    print "Q*R"
    print np.dot(Q,R)

def main():
    rows = int(sys.argv[1])
    cols = int(sys.argv[2])
    A = np.array(np.ones((rows,cols)).astype(np.float64))
    print "A"
    print A
    print "NUMPY"
    test_numpy(A.copy())
    print "CULA"
    test_cula(A.copy())

if __name__ == '__main__':
    main()

It produces following output:

A
[[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]
NUMPY
Q
[[-0.57735027 -0.57735027 -0.57735027]
 [-0.57735027  0.78867513 -0.21132487]
 [-0.57735027 -0.21132487  0.78867513]]
R
[[-1.73205081 -1.73205081 -1.73205081]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]]
transpose(Q)*Q
[[  1.00000000e+00   2.77555756e-17   0.00000000e+00]
 [  2.77555756e-17   1.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00   1.00000000e+00]]
Q*R
[[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]
CULA
Q
[[-0.57735027 -0.57735027 -0.57735027]
 [-0.57735027  0.78867513 -0.21132487]
 [-0.57735027 -0.21132487  0.78867513]]
R
[[-1.73205081  0.3660254   0.3660254 ]
 [-0.          0.          0.        ]
 [-0.          0.          0.        ]]
transpose(Q)*Q
[[  1.00000000e+00   2.77555756e-17   0.00000000e+00]
 [  2.77555756e-17   1.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00   1.00000000e+00]]
Q*R
[[ 1.         -0.21132487 -0.21132487]
 [ 1.         -0.21132487 -0.21132487]
 [ 1.         -0.21132487 -0.21132487]]

What is wrong with my code?

grapescan
  • 251
  • 1
  • 5
  • 10
  • QR decomposition is not unique if the matrix is not invertible. – pv. Apr 24 '14 at 17:18
  • @pv. As you can see in my example, CULA produces invalid R matrix so Q*R not equals A. The same problem with invertible matrix (e.g. [[2, 2], [2, 3]]). – grapescan Apr 25 '14 at 09:52
  • I've only tested CULA a couple of times, but I found it to yield incorrect results on a number of tests (particularly computing the SVD of a matrix). I didn't investigate too much, but it looked to me like an issue using 32-bit vs 64-bit floats. – lmjohns3 Apr 28 '14 at 16:57

2 Answers2

1

I tested your example in R. CULA seems to provide the same results as R. Here is my code:

#include <Rcpp.h>
#include <cula.h>

// [[Rcpp::export]]
std::vector< float > gpuQR_cula( std::vector< float > x, const uint32_t nRows, const uint32_t nCols )
{       
    std::vector< float > tau( nCols ) ;

    culaInitialize() ;   
    culaSgeqrf( nRows, nCols, &x.front(), nRows, &tau.front() ) ;
    culaShutdown() ;

    Rcpp::Rcout << "Tau: " << tau[ 0 ] << ", " << tau[ 1 ] << ", " << tau[ 2 ] << "\n" ;

    for( uint32_t jj = 0 ; jj < nCols ; ++jj ) {
        for( uint32_t ii = 1 ; ii < nRows ; ++ii ) {
            if( ii > jj ) { x[ ii + jj * nRows ] *= tau[ jj ] ; }
        }
    }

    return x ;
}

Your matrix:

(A <- matrix(1, 3, 3))

     [,1] [,2] [,3]
[1,]    1    1    1
[2,]    1    1    1
[3,]    1    1    1
n_row <- nrow(A)
n_col <- ncol(A)

Here are the results from CULA:

matrix(gpuQR_cula(c(A), n_row, n_col), n_row, n_col)

Tau: 1.57735, 0, 0
           [,1]      [,2]      [,3]
[1,] -1.7320509 -1.732051 -1.732051
[2,]  0.5773503  0.000000  0.000000
[3,]  0.5773503  0.000000  0.000000

Here are the results from R:

(qrA <- qr(A))
$qr
           [,1]      [,2]      [,3]
[1,] -1.7320508 -1.732051 -1.732051
[2,]  0.5773503  0.000000  0.000000
[3,]  0.5773503  0.000000  0.000000

$qraux
[1] 1.57735 0.00000 0.00000

Q <- qr.Q(qrA)
R <- qr.R(qrA)
crossprod(Q)

             [,1]         [,2]         [,3]
[1,] 1.000000e+00 4.163336e-17 5.551115e-17
[2,] 4.163336e-17 1.000000e+00 0.000000e+00
[3,] 5.551115e-17 0.000000e+00 1.000000e+00

Q %*% R
     [,1] [,2] [,3]
[1,]    1    1    1
[2,]    1    1    1
[3,]    1    1    1

I hope that helps!

chris
  • 461
  • 2
  • 10
1

This is a tricky, the issue here is that Python uses Row-major order, but CULA is using Column-major order as R does. Just check the CULA documentation for further details.

Here your example with scikit-cuda:

import numpy as np
import pycuda.gpuarray as gpuarray
import pycuda.autoinit
from skcuda import linalg
linalg.init()


# skcuda
A = np.ones( (3,3) )
A_gpu = gpuarray.to_gpu(np.array(A, order='F'))
Q , R = linalg.qr(A_gpu) 
Q, R = Q.get(), R.get()
print Q.dot(R) #recovers A
[[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]

print Q.T.dot(Q) # As expected
[[  1.00000000e+00  -5.55111512e-17   1.11022302e-16]
 [ -5.55111512e-17   1.00000000e+00  -2.22044605e-16]
 [  1.11022302e-16  -2.22044605e-16   1.00000000e+00]]

If you use instead (which is the default in Python)

A_gpu = gpuarray.to_gpu(np.array(A, order='C'))

you will get the same wrong results as you posted above.

This issue can cause several problems, so you have to be very careful and take care of the matrices order.

Cheers, Ben

benli
  • 11
  • 1