Is there a way I can allocate memory for scipy sparse matrix functions to process large datasets?
Specifically, I'm attempting to use Asymmetric Least Squares Smoothing (translated into python here and the original here) to perform a baseline correction on a large mass spec dataset (length of ~60,000).
The function (see below) uses the scipy.sparse matrix operations.
def baseline_als(y, lam, p, niter):
L = len(y)
D = sparse.csc_matrix(np.diff(np.eye(L), 2))
w = np.ones(L)
for i in xrange(niter):
W = sparse.spdiags(w, 0, L, L)
Z = W + lam * D.dot(D.transpose())
z = spsolve(Z, w*y)
w = p * (y > z) + (1-p) * (y < z)
return z
I have no problem when I pass data sets that are 10,000 or less in length:
baseline_als(np.ones(10000),100,0.1,10)
But when passing larger data sets, e.g.
baseline_als(np.ones(50000), 100, 0.1, 10)
I get a MemoryError, for the line
D = sparse.csc_matrix(np.diff(np.eye(L), 2))