How to split in chunks (submatrices), or handle a huge matrix giving memory error on numpy?

Question

I have a really large matrix that won't simply fit into memory. The matrix i have to work with has 483798149136 elements, this means 483 billions of floating point numbers.

The approach i was thinking about was to split this huge matrix somehow in different submatrices that fit into memory, perform pooling operations on these submatrices, and later join them all back to rebuild the original matrix which will hopefully fit into memory after all the pooling operations.

Please correct me if i'm wrong, this approach was just an idea i came out with, how good or bad it is, i dont know. If you have any better alternative ideas i'm open to any suggestions.

The code to reproduce this matrix would be:

a = np.arange(695556).reshape(834,834)
np.meshgrid(a,a)

I have been reading this post and this post, among other ones in this same site but none of them provides a true solution to these kind of problems, they just give vague suggestions.

My questions now are:

Is my splitting and pooling approach feasable or are there any other better ways of doing this?
How (in code terms) could i split this matrix into pieces (like windows or multidimensional kernels) and rebuild it up again later
Is there some way to process a matrix in chunks in numpy to perform operations with the matrix latyer, like multiplication, addition, etc...
Is there a specific package in Python that helps dealing with this kind of matrix problems

EDIT

Since some users are asking me about the goal of this whole operation, i'll provide some info:

I'm working on a some 3d printing project. In the process there is a laser beam that melts metal powder to create complex metal pieces. In this pieces there are layers, and the laser melts the metal layer by layer.

I have 3 csv files, each one containing a matrix of 834 x 834. The first matrix contain the coordinates values of the X axis when laser beam is going through powder bed and melting the metal, the second matrix is the same for the Y axis, and the third matrix represents the time the laser stands melting in the same pixel point. The values are expressed in seconds.

So i have the coordinates of the laser passing through the X and Y axes, and the time it takes to melt each point.

This matrices come out from images of the sections of each manufactured piece.

The issue is that the temperature in a certain pixel and the time the laser stands at that pixel can have an influence over the n pixel when the laser gets there. So i want to create a distance matrix that tells me how different or similar in terms of euclidean distance are each pixel of the image to each other.

This is why if i have for instance 2 834 x 834 matrices i need to create a matrix of 695556 x 695556 with the distances between every single point in the matrix to each other. And this is why is so huge and will not fit into memory.

I don't know if i gave too much information, or if my explanations are messy. You can ask whatever you need and i'll try to clarify it, but the main point is that i need to create this huge distance matrix in ordr to know the mathematical distances between pixels and then get to know the relation between what's happening in a certain point of the piece when printing it and what it needs to happen in other points to avoid manufacturing defects.

Thank you very much in advance

You've listed a few good options. For 1, memmapped arrays allow selectively loading parts of data. They work with most code with minimal modification. 3. "numexpr" is similar to this, although it is meant to provide memory locality. It won't fix out of memory errors, but with memmapped arrays it could work well. — user2699, Mar 01 '19 at 14:08
Hi @user2699 Thank you for your answer!! Could you provide more information about how memmapped arrays and numexpr works please?? — Miguel 2488, Mar 01 '19 at 14:11
I'll avoid too many details as I don't know the exact problem you're trying to solve. They're both mature enough to have good documentation online, (memmap is part of numpy, which has very comprehensive docs) I'd suggest starting with some of the questions on stackoverflow just to get a feel for what's possible (https://stackoverflow.com/search?q=numpy+memmapped). — user2699, Mar 01 '19 at 14:24
Although if your final output fits in memory, memmapped arrays may not be the best choice. They'll introduce quite a bit of overhead reading to or from disk. — user2699, Mar 01 '19 at 14:25
Maybe knowing more about the algorithm you are working on could help. Sometimes the NumPy vectorized way can get much more expensive in memory, whereas a Numba-accelerated loop can be very fast while using little memory. — jdehesa, Mar 01 '19 at 14:27
Ok thank you!! I'll have a look at the docs and hopefully it will help me with this situation — Miguel 2488, Mar 01 '19 at 14:27
Hi @jdehesa I'll update the question providing some more info about my goal with this matrix — Miguel 2488, Mar 01 '19 at 14:28
Hi @everyone. The question was updated with extra information about my work and the matrix. — Miguel 2488, Mar 01 '19 at 14:57
@Miguel2488 Thanks for the additional detail. So you need to have all these distances available at the same time, then? (not, like, the greatest, or smallest, or a neighborhood or something like that). One thing to note is it should be possible to compute and store only half of the pairwise distances, since it's a symmetric relation. — jdehesa, Mar 01 '19 at 15:16
Good explanation, that opens up some possibilities. One more thing, what exactly do you do with the resulting distances? I've done laser engraving where there's a tendency for the laser to overheat the material if it's near a spot too long, this sounds like a similar problem. If that's the case, it seems like you could do some processing of the time matrix without needing pairwise distances between every point. Can you share the structure of x, y and t, and the conditions you're looking for? — user2699, Mar 01 '19 at 15:27
Hi again @user2699. Thank you. Indeed yes, it sounds like a similar case. As the laser passes over a certain pixel the temperature of the nearby pixels is increased by heat radiation, so less laser time would be needed when passing over those nearby pixels to melt the powder, it's all about applying the correct temeprature to every single point. The point in calculating the distances is to see the relations in temperature terms between the points so we can know how what happen in a section of the piece can affect another section — Miguel 2488, Mar 01 '19 at 15:39
@user2699 If you can clarify what would you need to know about X,Y and T i could share it yes. hatn do you mean by structure? — Miguel 2488, Mar 01 '19 at 15:40
Well, for example are the xs and ys sequential? Or is there some other pattern that could make some calculations redundant? — user2699, Mar 01 '19 at 16:23
well we could talk about sequences in the case of the T matrix, which contains the time values for the laser beam standing in a certain position, indicated by the coordinates of the X and Y matrices — Miguel 2488, Mar 01 '19 at 21:16

Miguel 2488 · Accepted Answer · 2019-03-04T10:46:12.653

after all, i figured out a way to solve my problem. This huge matrices can be easily handled using dask. Dask is a python library that allows distributed computing and data division into chunks to optimize memory usage. It's pretty handy since it really allows you to work with literally huge and massive data at real low computing and memory cost, obviously, is not as fast as in-memory computing, but i think very people will be glad to know about this.

This package is pretty good optimized and is often update. The best of it is that it has numpy/pandas sintax, it also works with dataframes the same way than with arrays and if you know pandas/numpy you will feel like in home with dask.

You can create a dask distributed array like this:

import numpy as np
import dask.array as da

Y = da.random.normal(size=(695556, 695556),
                         chunks=(1000, 1000))

and then, you can perform some operations on it like this:

y = Y.mean(axis=0)[0:100].compute()

Also, if you use the memory_profiler package you can also monitor your memory and CPU usage, and see how much memory are your huge data consuming for computations.

There are some practical examples i found very illustrative here.

Also, the explanative array scope in this library can be found here.

And lastly, a guide about high performance computations in Python 3.X here.

Hope this helps someone with this same issue.

score 1 · Answer 2 · answered Mar 20 '22 at 13:09

Since matrix calculations are row by row or column by column I think you can easily reduce them. Consider 200 25x50 matricies to multiply 50x100 matricies you should get 200 times 25x100 matricies right ? First you can split 200 to n and for instance multiply 20 at times and then concat the resulting numpy arrays. Better than that you can use 50 x m instead of 50x100 and numpy multiply 25x50 with 50x10 for example then again concatenate (reshape join reshape again or use advanced indices) .

Here comes the part worth mentioning: For instance you have two 10E9x10E9 matricies and want to multiply them. Split them into 20x20 subparts and multiply them, after that sum and reshape the results. Suppose that you have a gpu with little memory , I believe you can stream data to its memory in a process, while you make it multiply previous chunk of data using multiprocessing. I have never tried that but I am goin to try it with a GT 1030 or 6500 XT using cupy or pyopencl and will edit this post and add that as an example.

There is also a way that doesn't involve re-inventing all that. https://medium.com/analytics-vidhya/matrix-multiplication-at-scale-using-map-reduce-d5dc16710095

How to split in chunks (submatrices), or handle a huge matrix giving memory error on numpy?

2 Answers2