I want to read a huge csv using read_csv and pandas and I want to show a progress bar since it is taking too long. Is there a way to do this? I have just seen examples with loops.
Asked
Active
Viewed 1,139 times
1
-
What is wrong with using loops? – gofvonx Apr 07 '21 at 10:32
-
I usually don't need loops to use read_csv. I have found a way using chunks. – Jess Apr 07 '21 at 12:47
1 Answers
3
Yes. You could abuse any of the number of arguments that accept a callable and call it at each row:
from tqdm.auto import tqdm
with tqdm() as bar:
# do not skip any of the rows, but update the progress bar instead
pd.read_csv('data.csv', skiprows=lambda x: bar.update(1) and False)
If you use Linux, you can get the total number of lines to get a more meaningful progress bar:
from tqdm.auto import tqdm
lines_number = !cat 'data.csv' | wc -l
with tqdm(total=int(lines_number[0])) as bar:
pd.read_csv('data.csv', skiprows=lambda x: bar.update(1) and False)
But if you do not like for-loops, you may also dislike context managers. You could get away with:
def none_but_please_show_progress_bar(*args, **kwargs):
bar = tqdm(*args, **kwargs)
def checker(x):
bar.update(1)
return False
return checker
pd.read_csv('data.csv', skiprows=none_but_please_show_progress_bar())
But I find it less stable - I do recommend to use the context manager based approach.

krassowski
- 13,598
- 4
- 60
- 92
-
With the first approach I get: HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value=''))) but no progress bar. and the last approach just restarts my kernel. – Jess Apr 07 '21 at 18:44
-
This means you have not installed the extension that enables widgets (including the nice tqdm bar) in JupyterLab notebooks. For JupyterLab 3.0 just do `pip install ipywidgets` and restart JupyterLab (see https://stackoverflow.com/questions/57343134/jupyter-notebooks-not-displaying-progress-bars). – krassowski Apr 07 '21 at 18:51
-
I said I do not recommend the last approach ;) But seriously, if this causes a kernel restart, that is a bug and it would be very helpful (for yourself and everyone else) if you could report it to the repository of the kernel you use. – krassowski Apr 07 '21 at 18:52
-
If you cannot get the nice widget working, you can always just use from `from tqdm import tqdm` instead of `from tqdm.auto import tqdm` – krassowski Apr 07 '21 at 18:59