Very large numpy array doesn't throw memory error. Where does it live?

Question

So I have the following numpy array:

X = np.zeros((1000000000, 3000), dtype=np.float32)

X.nbytes returns 12000000000000, which is 12 TB.

I certainly don't have that much memory (8GB to be exact). How did this happen? Where is the array allocated?

score 6 · Accepted Answer · answered Oct 19 '17 at 03:53

6

I guess you are using Mac. OSX will automatically use all available disk space as virtual memory. So maybe you have a biiiiiiig disk?

This code causes MemoryError on linux.

answered Oct 19 '17 at 03:53

Sraw

18,892
11
54
87

Ahhh. So it's an OS thing. – user3813674 Oct 19 '17 at 04:32

Matt Popovich · Answer 2 · 2017-10-19T04:38:12.057

I ran this on my Mac (OS 10.13, 16GB RAM, 512GB SSD) and had the same, successful results that you did.

This comment seems like a possible answer. In summary: since you're using zeros(), there's no need to have each cell of the matrix take up 4 bytes when they all have the same value. Rather, behind the scenes, numpy may be explicitly storing in memory all values in the matrix that are not equal to a common value (in this case, zero).

Worth noting that running np.random.rand(1000000000, 3000) causes some havoc on my Mac, which does the same thing as zeros() but fills the matrix with actual data. RAM gets maxed out, then begins to use the swap partition.

Before np.random.rand():

During np.random.rand():

Very large numpy array doesn't throw memory error. Where does it live?

2 Answers2