0

I have a number of indices and values that make up a scipy.coo_matrix. The indices/values are generated from different subroutines and are concatenated together before handed over to the matrix constructor:

import numpy
from scipy import sparse

n = 100000

I0 = range(n)
J0 = range(n)
V0 = numpy.random.rand(n)

I1 = range(n)
J1 = range(n)
V1 = numpy.random.rand(n)

# [...]

I = numpy.concatenate([I0, I1])
J = numpy.concatenate([J0, J1])
V = numpy.concatenate([V0, V1])

matrix = sparse.coo_matrix((V, (I, J)), shape=(n, n))

Now, the components of (I, J, V) can be quite large such that the concatenate operations become significant. (In the above example it takes over 20% of the runtime on my machine.) I'm reading that it's not possible to concatenate without a copy.

Is there a way for handing over indices and values without copying the input data around first?

Nico Schlömer
  • 53,797
  • 27
  • 201
  • 249

2 Answers2

1

If you look at the code for coo_matrix.__init__ you'll see that it's pretty simple. In fact if the (V, (I,J)) inputs are right it will simply assign those 3 arrays to its .data, row, col attributes. You can even check that after creation by comparing those attributes with your variables.

If they aren't 1d arrays of the right dtype, it will massage them - make the arrays, etc. So without getting into details, processing that you do before hand might save time in the coo call.

            self.row = np.array(row, copy=copy, dtype=idx_dtype)
            self.col = np.array(col, copy=copy, dtype=idx_dtype)
            self.data = np.array(obj, copy=copy)

One way or other those attributes will have to each be a single array, not a loose list of arrays or lists of lists.

sparse.bmat makes a coo matrix from other ones. It collected their coo attributes, joins them in the fill an empty array styles, and calls coo_matrix. Look at its code.

Almost all numpy operations that return a new array do so by allocating an empty and filling it. Letting numpy do that in compiled code (with np.concatentate) should be a be a little faster, but details like the size and number of inputs will make a difference.

A non_connonical coo matrix is just the start. Many operations require a conversion to one of the other formats.


Efficiently construct FEM/FVM matrix This is about sparse matrix constrution where there are many duplicate points that need to be summed - and using using the csr format for calculations.

Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353
0

You can try pre-allocating the arrays. It'll spare you the copy at least. I didn't see any speedup for the example, but you might see a change.

import numpy
from scipy import sparse

n = 100000

I = np.empty(2*n, np.double)
J = np.empty_like(I)
V = np.empty_like(I)
I[:n] = range(n)
J[:n] = range(n)
V[:n] = numpy.random.rand(n)

I[n:] = range(n)
J[n:] = range(n)
V[n:] = numpy.random.rand(n)
matrix = sparse.coo_matrix((V, (I, J)), shape=(n, n))
Elliot
  • 2,541
  • 17
  • 27
  • I'm pretty sure `concatenate` will allocate one large array and then copy the existing arrays into it. This should write the arrays into the preallocated array directly. – Elliot Feb 27 '17 at 15:16
  • 1
    I'm not sure about range, but with any realworld r.h.s. you won't save a copy like that. `numpy.random.rand(n)` will first create a temporary which is then copied into `V`. You may save a bit of memory because you can discard the earlier bits once they are inserted but that's it. The only way to make this work is to have the r.h.s. work in-place. Some numpy functions support this through the `out` keyword. – Paul Panzer Feb 27 '17 at 22:35