4

I'm processing customer and purchase data on Jupyter Notebook. I was comfortably writing and executing codes on it, but all of a sudden, it has slowed down and got to take forever to execute even one simple code like print('A'). The worst thing is that it's not showing me any error, so I have absolutely no idea about what's wrong with Jupyter Notebook or my codes.

The original data is kinda big. I merged two data sets which have 424,699 rows and 22 columns, and 4,308,392 rows and 39 columns in total respectively.

The versions:
Python → 3.7.4
Jupyter Notebook → 6.0.0
windows 10 pro

I just want to boost the speed of execution on Jupyter Notebook.

pingul
  • 3,351
  • 3
  • 25
  • 43
Pablito
  • 97
  • 1
  • 1
  • 7
  • Are you sure it didn't start some synchronous process? – WiseDev Aug 15 '19 at 06:07
  • 1
    You may have hit a memory limit, and now there's a lot of disk-memory swapping going on; hard to tell. If you're done with the current session, you may want to quit the notebook, and start a new one. With a fresh one, you could carefully tests if reading and merging all the data is causing your slow-down. – 9769953 Aug 15 '19 at 06:08
  • Thank you, 0 0! I will try again on the new notebook! – Pablito Aug 15 '19 at 06:54

1 Answers1

6

Probably your memory use gets quite high, and then the jupyter notebook slows down, as it goes on your hard disk then. The risk is also that it can crash soon.

Try to get clean all the data you do not need anymore. If you do not need a dataset after the merge, delete it. How to delete multiple pandas (python) dataframes from memory to save RAM?

a, b, c = pd.DataFrame(), pd.DataFrame(), pd.DataFrame()
lst = [a, b, c]
del a, b, c # dfs still in list
del lst     # memory release now

In this thread you can get an idea how to track your memory and cpu use in python: How to get current CPU and RAM usage in Python?

#!/usr/bin/env python
import psutil
# gives a single float value
psutil.cpu_percent()
# gives an object with many fields
psutil.virtual_memory()
# you can convert that object to a dictionary 
dict(psutil.virtual_memory()._asdict())

Here is also an overview how much the different datatypes using of your memory depending on your system: In-memory size of a Python structure

PV8
  • 5,799
  • 7
  • 43
  • 87
  • 1
    Thank you for your answer, PV8!!! I executed the code you provided, and got the following result. {'total': 34128306176, 'available': 9679208448, 'percent': 71.600, 'used': 24449097728, 'free': 9679208448} What do you know from this result?? Have I used so much memory??? – Pablito Aug 15 '19 at 06:49
  • yes, you are using already 24GB memory, you can check the documentation for further details: https://psutil.readthedocs.io/en/latest/ – PV8 Aug 15 '19 at 07:00
  • Okay, thanks!! and I tried to delete four dataframes on the notebook by writing 'del cst, trn, trn_tmp, cst2', but still the last code I wrote hasn't got through. Is the way I tried wrong, do you think? or is there another way to get it solved??? – Pablito Aug 15 '19 at 07:20
  • If you really need to work with this dataframe, you can try to convert it to different types: https://stackoverflow.com/questions/1331471/in-memory-size-of-a-python-structure based on their memory use, but do you really need 22 columns? try to delete some columns as well – PV8 Aug 15 '19 at 07:43
  • Hi! I just tried a couple of things to release the momoery, and I was able to reduce the used memory by 6 GB. However, still the code cannot get through. What do you think about this situation??? I do not think I can decrease more. – Pablito Aug 16 '19 at 02:35
  • Also, do you think the code below is inefficient and there's another better way??? whole[whole['cst_id'].map(lambda x: x in ppl['cst_id'].tolist())] What I wanna do is extract 'cst_id' that match the one in "ppl" from "whole". – Pablito Aug 16 '19 at 02:48
  • if you work with vectorized solutions and avoid loops it is always a good idea – PV8 Aug 16 '19 at 05:54
  • I have experienced a severe slowdown and finally noticed that this command was responsible `pd.options.display.max_columns=None`. It is weird, but it made the jupyter notebook so slow. I made `None` to `100` and it was sorted out. – Mehdi Oct 18 '22 at 20:44