How can I run svd and nmf on an extremely sparse matrix of dimensions, say, 70000 x 70000? The sparse version of this matrix can be stored as a less than 700M binary file on disk. Can I factorize it in a sparse format (like file on disk or storable in memory) without reconstructing the whole matrix which will be impossible to store in memory (even hard to store on disk)?
I know there are irlba in R, sklearn and pymf in python. But it seems they need to reconstruct the matrix? The problem of svd is that I cannot save the matrices S,V and D, but what if I specify a K and only save the matrices S_k, V_k and D_k corresponding to k-largest eigenvalue? And as for nmf, I want to factorize it into W with rank = 100, which can be stored in memory.
And if there are certain ways to do so, what is the expected time to compute svd and nmf? Any help will be appreciated!