3

I'm simply loading a CSV data inside jupyter-lab as follows:

data = pd.read_csv('data_simple.csv')

The file is around 300 MB. So when I load it, the memory usage increases significantly; let's say 500 MB. That's okay.

But when I run the exact same cell again, memory usage increases as much as the first time. And it keeps going as I run the same cell.

Why this happens? I'm loading it into the same variable: data. Shouldn't it just free the old data and re-assign it? Where the old data goes if it just keeps all the data in the memory? I have tried to Google it but couldn't find anything except this. Thanks.

emremrah
  • 1,733
  • 13
  • 19
  • The answer is in the very link you posted in the question. The notebook stores all output of the executed cells in the Out[n]. So if you run a cell once, the result will be stored in the Out[1], if you run in another time, the second result will be in the Out[2]. – dodekja Apr 10 '20 at 10:50
  • I thought that's because of the library `holoviews`. Do you know how to make jupyter to not to do that, if I assign to the same variable? – emremrah Apr 10 '20 at 10:54
  • I believe there was a keyboard shortcut for it, but [this question might help you](https://stackoverflow.com/questions/24816237/ipython-notebook-clear-cell-output-in-code) – dodekja Apr 10 '20 at 11:01
  • Do you mean just right clicking and `clear outputs` would work? Because it didn't. – emremrah Apr 10 '20 at 11:33
  • In [answer to this question](https://stackoverflow.com/questions/16261240/releasing-memory-of-huge-numpy-array-in-ipython) there are described ways to clear the Out output array. Those should work. I don't have access to jupyter right now so I can't try them now. – dodekja Apr 10 '20 at 11:40

0 Answers0