I am using scipy.sparse
in my application and want to do some performance tests. In order to do that, I need to create a large sparse matrix (which I will then use in my application). As long as the matrix is small, I can create it using the command
import scipy.sparse as sp
a = sp.rand(1000,1000,0.01)
Which results in a 1000 by 1000 matrix with 10.000 nonzero entries (a reasonable density meaning approximately 10 nonzero entries per row)
The problem is when I try to create a larger matrix, for example, a 100.000 by 100.000 matrix (I have dealt with way larger matrices before), I run
import scipy.sparse as sp
N = 100000
d = 0.0001
a = sp.rand(N, N, d)
which should result in a 100.000 by 100.000 matrix with one million nonzero entries (way in the realm of possible), I get an error message:
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
sp.rand(100000,100000,0.0000001)
File "C:\Python27\lib\site-packages\scipy\sparse\construct.py", line 723, in rand
j = random_state.randint(mn)
File "mtrand.pyx", line 935, in mtrand.RandomState.randint (numpy\random\mtrand\mtrand.c:10327)
OverflowError: Python int too large to convert to C long
Which is some annoying internal scipy
error I cannot remove.
I understand that I can create a 10*n by 10*n matrix by creating one hundred n by n matrices, then stacking them together, however, I think that scipy.sparse
should be able to handle the creation of large sparse matrices (I say again, 100k by 100k is by no means large, and scipy
is more than comfortable handling matrices with several million rows). Am I missing something?