Jupyter Lab freezes the computer when out of RAM - how to prevent it?

Question

I have recently started using Jupyter Lab and my problem is that I work with quite large datasets (usually the dataset itself is approx. 1/4 of my computer RAM). After few transformations, saved as new Python objects, I tend to run out of memory. The issue is that when I'm approaching available RAM limit and perform any operation that needs another RAM space my computer freezes and the only way to fix it is to restart it. Is this a default behaviour in Jupyter Lab/Notebook or is it some settings I should set? Normally, I would expect the program to crash out (as in RStudio for example), not the whole computer

I had the same problem before, it's really nasty. I had a quick look in the jupyter issues and found nothing. Does it happen also if you run through the IPython (not plain python) console? — Martino, Oct 22 '19 at 09:43
What package / module you used? What OS is it? Did you have swap? What version of Jupyter Lab? If it was Linux what the kernel version? — Nizam Mohamed, Oct 23 '19 at 23:55
It's mostly Pandas, but I don't think it's package-related. The OS is Ubuntu 16.04.6 LTS and the kernel version is 4.15.0-65-generic. Jupyter Lab version is 1.0.2. I have a SWAP set to 12 GB (assigned to 2 files) which is 1.5 of my RAM. — jakes, Oct 24 '19 at 06:32

kd88 · Answer 1 · 2019-10-28T08:11:32.753

Absolutely the most robust solution to this problem would be to use Docker containers. You can specify how much memory to allocate to Jupyter, and if the container runs out of memory it's simply not a big deal (just remember to save frequently, but that goes without saying).

This blog will get you most of the way there. There are also some decent instructions setting up Jupyter Lab from one of the freely available, officially maintained, Jupyter images here:

https://medium.com/fundbox-engineering/overview-d3759e83969c

and then you can modify the docker run command as described in the tutorial as (e.g. for 3GB):

docker run --memory 3g <other docker run args from tutorial here>

For syntax on the docker memory options, see this question:

What unit does the docker run "--memory" option expect?

emremrah · Answer 2 · 2021-02-13T14:26:42.337

If you are using a Linux based OS, check out OOM killers, you can get information from here. I don't know the details for Windows.

You can use earlyoom. It can be configured as you wish, e.g. earlyoom -s 90 -m 15 will start the earlyoom and when swap size is less than %90 and memory is less than %15, it will kill the process that causes OOM and prevent the whole system to freeze. You can also configure the priority of the processes.

Elizabeth · Answer 3 · 2019-10-25T22:30:48.387

I also work with very large datasets (3GB) on Jupyter Lab and have been experiencing the same issue on Labs. It's unclear if you need to maintain access to the pre-transformed data, if not, I've started using del of unused large dataframe variables if I don't need them. del removes variables from your memory. Edit** : there a multiple possibilities for the issue I'm encountering. I encounter this more often when I'm using a remote jupyter instance, and in spyder as well when I'm perfoming large transformations.

e.g.

df = pd.read('some_giant_dataframe') # or whatever your import is
new_df = my_transform(df)
del df # if unneeded.

Jakes you may also find this thread on large data workflows helpful. I've been looking into Dask to help with memory storage.

I've noticed in spyder and jupyter that the freezeup will usually happen when working in another console while a large memory console runs. As to why it just freezes up instead of crashing out, I think this has something to do with the kernel. There are a couple memory issues open in the IPython github - #10082 and #10117 seem most relevant. One user here suggest disabling tab completion in jedi or updating jedi.

In 10117 they propose checking the output of get_ipython().history_manager.db_log_output. I have the same issues and my setting is correct, but it's worth checking

score 1 · Answer 4 · answered Oct 26 '19 at 08:20

1

You can also use notebooks in the cloud also, such as Google Colab here. They have provided facility for recommended RAMs and support for Jupyter notebook is by default.

answered Oct 26 '19 at 08:20

Jishan Shaikh

1,572
2
13
31

score 0 · Answer 5 · answered Oct 25 '19 at 00:04

I am going to summarize the answers from the following question. You can limit the memory usage of your programm. In the following this will be the function ram_intense_foo(). Before calling that you need to call the function limit_memory(10)

import resource
import platform
import sys
import numpy as np 

def memory_limit(percent_of_free):
    soft, hard = resource.getrlimit(resource.RLIMIT_AS)
    resource.setrlimit(resource.RLIMIT_AS, (get_memory() * 1024 * percent_of_free / 100, hard))

def get_memory():
    with open('/proc/meminfo', 'r') as mem:
        free_memory = 0
        for i in mem:
            sline = i.split()
            if str(sline[0]) == 'MemAvailable:':
                free_memory = int(sline[1])
                break
    return free_memory

def ram_intense_foo(a,b):
    A = np.random.rand(a,b)
    return A.T@A

if __name__ == '__main__':
    memory_limit(95)
    try:
        temp = ram_intense_foo(4000,10000)
        print(temp.shape)
    except MemoryError:
        sys.stderr.write('\n\nERROR: Memory Exception\n')
        sys.exit(1)

Gray · Answer 6 · 2019-10-17T05:34:38.600

There is no reason to view the entire output of a large dataframe. Viewing or manipulating large dataframes will unnecessarily use large amounts of your computer resources.

Whatever you are doing can be done in miniature. It's far easier working on coding and manipulating data when the data frame is small. The best way to work with big data is to create a new data frame that takes only small portion or a small sample of the large data frame. Then you can explore the data and do your coding on the smaller data frame. Once you have explored the data and get your code working, then just use that code on the larger data frame.

The easiest way is simply take the first n, number of the first rows from the data frame using the head() function. The head function prints only n, number of rows. You can create a mini data frame by using the head function on the large data frame. Below I chose to select the first 50 rows and pass their value to the small_df. This assumes the BigData is a data file that comes from a library you opened for this project.

library(namedPackage) 

df <- data.frame(BigData)                #  Assign big data to df
small_df <- head(df, 50)         #  Assign the first 50 rows to small_df

This will work most of the time, but sometimes the big data frame comes with presorted variables or with variables already grouped. If the big data is like this, then you would need to take a random sample of the rows from the big data. Then use the code that follows:

df <- data.frame(BigData)

set.seed(1016)                                          # set your own seed

df_small <- df[sample(nrow(df),replace=F,size=.03*nrow(df)),]     # samples 3% rows
df_small                                                         # much smaller df

Jupyter Lab freezes the computer when out of RAM - how to prevent it?

6 Answers6

Linked