3

I am trying to do a Cholesky decomposition via pdpotrf() of MKL-Intel's library, which uses ScaLAPACK. I am reading the whole matrix in the master node and then distribute it like in this example. Everything works fine when the dimension of the SPD matrix is even. However, when it's odd, pdpotrf() thinks that the matrix is not positive definite.

Could it be because the submatrices are not SPD? I am working with this matrix: enter image description here

and the submatrices are (with 4 processes and blocks of size 2x2):

A_loc on node 0
  4   1   2
  1 0.5   0
  2   0  16

nrows = 3, ncols = 2
A_loc on node 1
  2 0.5
  0   0
  0   0

nrows = 2, ncols = 3
A_loc on node 2
  2   0   0
0.5   0   0

nrows = 2, ncols = 2
A_loc on node 3
  3   0
  0 0.625

Here, every submatrix is not SPD, however, the overall matrix is SPD (have checked with running with 1 process). What should I do? Or there is nothing I can do and pdpotrf() does not work with matrices of odd size?


Here is how I call the routine:

int iZERO = 0;
int descA[9];
// N, M dimensions of matrix. lda = N
// Nb, Mb dimensions of block
descinit_(descA, &N, &M, &Nb, &Mb, &iZERO, &iZERO, &ctxt, &lda, &info);
...
pdpotrf((char*)"L", &ord, A_loc, &IA, &JA, descA, &info);

I also tried this:

// nrows/ncols is the number of rows/columns a submatrix has
descinit_(descA, &N, &M, &nrows, &ncols, &iZERO, &iZERO, &ctxt, &lda, &info);

but I get an error:

{ 0, 0}: On entry to { 0, 1}: On entry to PDPOTR{ 1, 0}: On entry to PDPOTRF parameter number 605 had an illegal value { 1, 1}: On entry to PDPOTRF parameter number 605 had an illegal value F parameter number 605 had an illegal value

PDPOTRF parameter number 605 had an illegal value info < 0: If the i-th argument is an array and the j-entry had an illegal value, then INFO = -(i*100+j), if the i-th argument is a scalar and had an illegal value, then INFO = -i. info = -605

From my answer, you can see what the arguments of the function mean.


The code is based on this question. Output:

gsamaras@pythagoras:~/konstantis/check_examples$ ../../mpich-install/bin/mpic++ -o test minor.cpp -I../../intel/mkl/include ../../intel/mkl/lib/intel64/libmkl_scalapack_lp64.a       -Wl,--start-group       ../../intel/mkl/lib/intel64/libmkl_intel_lp64.a ../../intel/mkl/lib/intel64/libmkl_core.a  ../../intel/mkl/lib/intel64/libmkl_sequential.a    -Wl,--end-group ../../intel/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.a -lpthread       -lm     -ldl
gsamaras@pythagoras:~/konstantis/check_examples$ mpiexec -n 4 ./test
Processes grid pattern:
0 1
2 3
nrows = 3, ncols = 3
A_loc on node 0
  4   1   2
  1 0.5   0
  2   0  16

nrows = 3, ncols = 2
A_loc on node 1
  2 0.5
  0   0
  0   0

nrows = 2, ncols = 3
A_loc on node 2
  2   0   0
0.5   0   0

nrows = 2, ncols = 2
A_loc on node 3
  3   0
  0 0.625

Description init sucesss!
matrix is not positive definte
Matrix A result:
  2   1   2 0.5   2
0.5 0.5   0   0   0
  1   0   1   0 -0.25
0.25  -1 -0.5 0.625   0
  1  -1  -2 -0.5  14
Community
  • 1
  • 1
gsamaras
  • 71,951
  • 46
  • 188
  • 305
  • You might try the LAPACK version (dpotrf)... – Jeff Hammond Jul 05 '15 at 19:12
  • That's not distributed @Jeff. I have checked that with `dpotrf()` as well and it works. However, I am interested only in distributed solutions. – gsamaras Jul 05 '15 at 19:27
  • I am aware of that. The point was to get debugging insight. It seems you've done that. – Jeff Hammond Jul 05 '15 at 19:29
  • You could also try Elemental, which has a more friendly API, performs the same or better for Cholesky, and supports more than just dense matrices. – Jeff Hammond Jul 05 '15 at 19:31
  • Oh OK @Jeff, yes I had, should have mentioned, sorry, will edit. I know Elemental, and Jack Poulson who is the project leader is very friendly and good. However, I had to use ScaLAPACK for various reasons. So, now I have to see what's going on with `pdpotrf()`. Do you think that it can not handle odd dimensions? – gsamaras Jul 05 '15 at 19:31
  • I'm sure it can handle odd dimensions. My guess is that there is an error in your input arguments. Best thing to do is include a http://stackoverflow.com/help/mcve. – Jeff Hammond Jul 05 '15 at 19:35
  • @Jeff I updated the question. Hope the downvoter will see that too! EDIT: Guess (s)he did not! – gsamaras Jul 06 '15 at 11:23

1 Answers1

3

The issue may come from :

MPI_Bcast(&lda, 1, MPI_INT, 0, MPI_COMM_WORLD);

Before this line, lda is different on each process if the dimension of the matrix is odd. Two processes handle 2 rows and two processes handle 3 rows. But after the MPI_Bcast(), lda is the same everywhere (3).

The problem is that the argument lda of the subroutine DESCINIT must be the leading dimension of the local array, that is either 2 or 3.

By commenting MPI_Bcast(), i got:

Description init sucesss!
SUCCESS
Matrix A result:
2   1   2 0.5   2 
0.5 0.5   0   0   0 
1  -1   1   0   0 
0.25 -0.25 -0.5 0.5   0 
1  -1  -2  -3   1 

At last, it would explain that the program works well for even dimensions and fails for odd dimensions !

francis
  • 9,525
  • 2
  • 25
  • 41
  • Oh, I thought of the global matrix, good job Francis. Nice top answer btw. Dacor :P There is also this link: https://software.intel.com/en-us/forums/topic/561007 – gsamaras Jul 08 '15 at 10:53