JupyterLab Kernel Restarts when I load too much data

Question

I'm running a Notebook on JupyterLab. I am loading in some large Monte Carlo chains as numpy arrays which have the shape (500000, 150). I have 10 chains which I load into a list in the following way:

chains = []
for i in range(10):
    chain = np.loadtxt('my_chain_{}.txt'.format(i)) 
    chains.append(chain)

If I load 5 chains then all works well. If I try to load 10 chains, after about 6 or 7 I get the error:

Kernel Restarting
The kernel for my_code.ipynb appears to have died. It will restart automatically.

I have tried loading the chains in different orders to make sure there is not a problem with any single chain. It always fails when loading number 6 or 7 no matter the order, so I think the chains themselves are fine.

I have also tried to load 5 chain in one list and then in the next cell try to load the other 5, but the fail still happens when I get to 6 or 7, even when I split like this.

So it seems like the problem is that I'm loading too much data into the Notebook or something like that. Does this seem right? Is there a work around?

You need more memory. What type of data is it? Numeric? – Josh Friedlander Sep 19 '22 at 20:16 — Josh Friedlander, Sep 19 '22 at 20:16

score 1 · Answer 1 · answered Sep 24 '22 at 19:36

It is indeed possible that you are running out of memory, though unlikely that it's actually your system that is running out of memory (unless it's a very small system). It is typical behavior that if jupyter exceeds its memory limits, the kernel will die and restart, see here, here and here.

Consider that if you are using the float64 datatype by default, the memory usage (in megabytes) per array is: N_rows * N_cols * 64 / 8 / 1024 / 1024 For N_rows = 500000 and N_cols = 150, that's 572 megabytes per array. You can verify this directly using numpy's dtype and nbytes attributes (noting that the output is in bytes, not bits):

chain = np.loadtxt('my_chain_{}.txt'.format(i))
print(chain.dtype)
print(chain.nbytes / 1024 / 1024)

If you are trying to load 10 of these arrays, that's about 6 gigabytes.

One workaround is increasing the memory limits for jupyter, per the posts referenced above. Another simple workaround is using a less memory-intensive floating point datatype. If you don't really need the digits of accuracy afforded by float64 (see here) you could simply use a smaller floating point representation, e.g. float32

chain = np.loadtxt('my_chain_{}.txt'.format(i), dtype=np.float32) 
chains.append(chain)

Given that you can get to 6 or 7 already, halving the data usage of each chain should be enough to get you to 10.

score 0 · Answer 2 · answered Sep 18 '22 at 23:34

0

You could be running out of memory. Try to load the chains one by one and then concatenate them.

chains = []
for i in range(10):
    chain = np.loadtxt('my_chain_{}.txt'.format(i)) 
    chains.append(chain)
    if i > 0:
        chains = np.concatenate((chains[0], 
chains[1]), axis=0)
        chains.pop(1)

answered Sep 18 '22 at 23:34

James

144
6

Thank you. Am I not already loading them one-by-one in my original code? We are both looping over a range of 10 and loading in the chains and then appending / concatenating them to a list or an array no? – user1551817 Sep 19 '22 at 10:12

JupyterLab Kernel Restarts when I load too much data

2 Answers2