2

I am currently using Cython but my code involved taking the norm of each row/column combination multiple times in a loop so even a 1000x1000 matrix is extremely slow. I am thinking of switching to C+ LAPACK.. Whats the standard for working with huge matrices?

The most expensive steps, it looks like, involved taking the dot product of all row/column combinations and looping over the upper triangle of the matrix.

The matrices are not sparse.

methane
  • 667
  • 2
  • 8
  • 17

2 Answers2

2

Try Numpy, 1000x1000 sounds kinda small for it.
It is implemented in C and can take advantage of linear algebra libraries, so it can be fast. Read this.
If there aren't many entries, Use sparse matrices.

import numpy as np
A = np.random.rand(1000,1000)
np.linalg.norm(A, axis=1)    # takes me under 5 ms

You should provide more details if you want more specific help.

Community
  • 1
  • 1
Prashant Kumar
  • 20,069
  • 14
  • 47
  • 63
  • 1
    +1, but OP said `10000x10000`, which is pretty big even for numpy. For me, `timeit np.linalg.norm(A, axis=1)` gives `1.59 s per loop` – askewchan Nov 26 '13 at 20:14
  • Ah, I got lost in all the zeros. 10000x10000 is pretty big, so maybe another venue besides Numpy should be investigated, like C code/libraries. But it is telling that he says 1000x1000 is extremely slow. Sounds to me like the code needs to take better advantage of vectorization. – Prashant Kumar Nov 26 '13 at 20:19
  • Yeah, I agree. If they get bigger though, it's memory that's an issue not just speed, and something like PyTables might be necessary. – askewchan Nov 26 '13 at 20:21
  • Hm, actually, `timeit np.sqrt(np.einsum('ij,ij->i', A, A))` gives `147 ms per loop` – askewchan Nov 26 '13 at 20:26
  • For `n=1e5; np.random.rand(n, n)`, sqrt-einsum takes me just `45 ms per loop`. But using `linalg.norm`, my ipython session dies! – Prashant Kumar Nov 26 '13 at 21:24
  • Ha, funny that your `einsum` is faster, but my `linalg.norm` at least survives :P – askewchan Nov 26 '13 at 21:28
1

In C, as a general guideline and with no sample code to look at, the only thing you can do to optimize matrix operations is making sure you are using contiguous memory blocks so the whole matrices can be kept in the processor's cache (or at least reduce RAM interaction to the possible minimum), i.e. if you are dynamically allocating memory, ask for a whole block of memory for each matrix, and then either handle the indexes arithmetically:

for (i = 0;i < rows; i++)
{
  for (j = 0; j < columns; j++)
  {
     matrix[i*rows + j] = do_whatever();
   }
}

or create a set of pointers to the beginning of your columns if you prefer to use the standard [i][j] notation, although this approach has the potential to reduce performance since processor would have to handle 2 arrays instead of one for a single matrix. If you are using standard arrays, you won't have to worry about it.

The other important change you can make is parallelization of your calculations (multiple threads).

Working with matrices is inherently slow, and optimization tricks can only be applied if certain assumptions about the data can be made, like symmetry or some other property that could save you some operations.