I have a generator that generates single dimension numpy.array
s of the same length. I would like to have a sparse matrix containing that data. Rows are generated in the same order I'd like to have them in the final matrix. csr
matrix is preferable over lil
matrix, but I assume the latter will be easier to build in the scenario I'm describing.
Assuming row_gen
is a generator yielding numpy.array
rows, the following code works as expected.
def row_gen():
yield numpy.array([1, 2, 3])
yield numpy.array([1, 0, 1])
yield numpy.array([1, 0, 0])
matrix = scipy.sparse.lil_matrix(list(row_gen()))
Because the list will essentially ruin any advantages of the generator, I'd like the following to have the same end result. More specifically, I cannot hold the entire dense matrix (or a list of all matrix rows) in memory:
def row_gen():
yield numpy.array([1, 2, 3])
yield numpy.array([1, 0, 1])
yield numpy.array([1, 0, 0])
matrix = scipy.sparse.lil_matrix(row_gen())
However it raises the following exception when run:
TypeError: no supported conversion for types: (dtype('O'),)
I also noticed the trace includes the following:
File "/usr/local/lib/python2.7/site-packages/scipy/sparse/lil.py", line 122, in __init__
A = csr_matrix(A, dtype=dtype).tolil()
Which makes me think using scipy.sparse.lil_matrix
will end up creating a csr
matrix and only then convert that to a lil
matrix. In that case I would rather just create csr
matrix to begin with.
To recap, my question is: What is the most efficient way to create a scipy.sparse
matrix from a python generator or numpy single dimensional arrays?