I have a really large matrix that won't simply fit into memory. The matrix i have to work with has 483798149136
elements, this means 483 billions of floating point numbers.
The approach i was thinking about was to split this huge matrix somehow in different submatrices that fit into memory, perform pooling operations on these submatrices, and later join them all back to rebuild the original matrix which will hopefully fit into memory after all the pooling operations.
Please correct me if i'm wrong, this approach was just an idea i came out with, how good or bad it is, i dont know. If you have any better alternative ideas i'm open to any suggestions.
The code to reproduce this matrix would be:
a = np.arange(695556).reshape(834,834)
np.meshgrid(a,a)
I have been reading this post and this post, among other ones in this same site but none of them provides a true solution to these kind of problems, they just give vague suggestions.
My questions now are:
Is my splitting and pooling approach feasable or are there any other better ways of doing this?
How (in code terms) could i split this matrix into pieces (like windows or multidimensional kernels) and rebuild it up again later
Is there some way to process a matrix in chunks in numpy to perform operations with the matrix latyer, like multiplication, addition, etc...
Is there a specific package in Python that helps dealing with this kind of matrix problems
EDIT
Since some users are asking me about the goal of this whole operation, i'll provide some info:
I'm working on a some 3d printing project. In the process there is a laser beam that melts metal powder to create complex metal pieces. In this pieces there are layers, and the laser melts the metal layer by layer.
I have 3 csv files, each one containing a matrix of 834 x 834. The first matrix contain the coordinates values of the X axis when laser beam is going through powder bed and melting the metal, the second matrix is the same for the Y axis, and the third matrix represents the time the laser stands melting in the same pixel point. The values are expressed in seconds.
So i have the coordinates of the laser passing through the X and Y axes, and the time it takes to melt each point.
This matrices come out from images of the sections of each manufactured piece.
The issue is that the temperature in a certain pixel and the time the laser stands at that pixel can have an influence over the n pixel when the laser gets there. So i want to create a distance matrix that tells me how different or similar in terms of euclidean distance are each pixel of the image to each other.
This is why if i have for instance 2 834 x 834 matrices i need to create a matrix of 695556 x 695556 with the distances between every single point in the matrix to each other. And this is why is so huge and will not fit into memory.
I don't know if i gave too much information, or if my explanations are messy. You can ask whatever you need and i'll try to clarify it, but the main point is that i need to create this huge distance matrix in ordr to know the mathematical distances between pixels and then get to know the relation between what's happening in a certain point of the piece when printing it and what it needs to happen in other points to avoid manufacturing defects.
Thank you very much in advance