So I have these ginormous matrices X and Y. X and Y both have 100 million rows, and X has 10 columns. I'm trying to implement linear regression with these matrices, and I need the quantity (X^T*X)^-1 * X^T * Y
. How can I compute this as space-efficiently as possible?
Right now I have
X = readMatrix("fileX.txt")
Y = readMatrix("fileY.txt")
return (X.getT() * X).getI() * X.getT() * Y
How many matrices are being stored in memory here? Are more than two matrices being stored at once? Is there a better way to do it?
I have about 1.5 GB of memory for this project. I can probably stretch it to 2 or 2.5 if I close every other program. Ideally the process would run in a short amount of time also, but the memory bound is more strict.
The other approach I've tried is saving the intermediate steps of the calculation as text files and reloading them after every step. But that is very slow.