I am trying to learn ML using Kaggle datasets. In one of the problems (using Logistic regression) inputs and parameters matrices are of size (1110001, 8) & (2122640, 8) respectively.
I am getting memory error while doing it in python. This would be same for any language I guess since it's too big. My question is how do they multiply matrices in real life ML implementations (since it would usually be this big)?
Things bugging me :
Some ppl in SO have suggested to calculate dot product in parts and then combine. But even then matrix would be still too big for RAM (9.42TB? in this case)
And If I write it to a file wouldn't it be too slow for optimization algorithms to read from file and minimize function?
Even if I do write it to file how would fmin_bfgs(or any opt. function) read from file?
Also Kaggle notebook shows only 1GB of storage available. I don't think anyone would allow TBs of storage space.
In my input matrix many rows have similar values for some columns. Can I use it my advantage to save space? (like sparse matrix for zeros in matrix)
Can anyone point me to any real life sample implementation of such cases. Thanks!