Why Jupyter Notebook consumes more and more data as I load a dataset into the same variable several times?

Question

I'm simply loading a CSV data inside jupyter-lab as follows:

data = pd.read_csv('data_simple.csv')

The file is around 300 MB. So when I load it, the memory usage increases significantly; let's say 500 MB. That's okay.

But when I run the exact same cell again, memory usage increases as much as the first time. And it keeps going as I run the same cell.

Why this happens? I'm loading it into the same variable: data. Shouldn't it just free the old data and re-assign it? Where the old data goes if it just keeps all the data in the memory? I have tried to Google it but couldn't find anything except this. Thanks.

The answer is in the very link you posted in the question. The notebook stores all output of the executed cells in the Out[n]. So if you run a cell once, the result will be stored in the Out[1], if you run in another time, the second result will be in the Out[2]. — dodekja, Apr 10 '20 at 10:50
I thought that's because of the library `holoviews`. Do you know how to make jupyter to not to do that, if I assign to the same variable? — emremrah, Apr 10 '20 at 10:54
I believe there was a keyboard shortcut for it, but [this question might help you](https://stackoverflow.com/questions/24816237/ipython-notebook-clear-cell-output-in-code) — dodekja, Apr 10 '20 at 11:01
Do you mean just right clicking and `clear outputs` would work? Because it didn't. — emremrah, Apr 10 '20 at 11:33
In [answer to this question](https://stackoverflow.com/questions/16261240/releasing-memory-of-huge-numpy-array-in-ipython) there are described ways to clear the Out output array. Those should work. I don't have access to jupyter right now so I can't try them now. — dodekja, Apr 10 '20 at 11:40

Why Jupyter Notebook consumes more and more data as I load a dataset into the same variable several times?

0 Answers0