1

I want to read a huge csv using read_csv and pandas and I want to show a progress bar since it is taking too long. Is there a way to do this? I have just seen examples with loops.

Jess
  • 33
  • 6

1 Answers1

3

Yes. You could abuse any of the number of arguments that accept a callable and call it at each row:

from tqdm.auto import tqdm

with tqdm() as bar:
    # do not skip any of the rows, but update the progress bar instead
    pd.read_csv('data.csv', skiprows=lambda x: bar.update(1) and False)

If you use Linux, you can get the total number of lines to get a more meaningful progress bar:

from tqdm.auto import tqdm

lines_number = !cat 'data.csv' | wc -l

with tqdm(total=int(lines_number[0])) as bar:
    pd.read_csv('data.csv', skiprows=lambda x: bar.update(1) and False)

But if you do not like for-loops, you may also dislike context managers. You could get away with:

def none_but_please_show_progress_bar(*args, **kwargs):
    bar = tqdm(*args, **kwargs)

    def checker(x):
        bar.update(1)
        return False

    return checker

pd.read_csv('data.csv', skiprows=none_but_please_show_progress_bar())

But I find it less stable - I do recommend to use the context manager based approach.

krassowski
  • 13,598
  • 4
  • 60
  • 92
  • With the first approach I get: HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value=''))) but no progress bar. and the last approach just restarts my kernel. – Jess Apr 07 '21 at 18:44
  • This means you have not installed the extension that enables widgets (including the nice tqdm bar) in JupyterLab notebooks. For JupyterLab 3.0 just do `pip install ipywidgets` and restart JupyterLab (see https://stackoverflow.com/questions/57343134/jupyter-notebooks-not-displaying-progress-bars). – krassowski Apr 07 '21 at 18:51
  • I said I do not recommend the last approach ;) But seriously, if this causes a kernel restart, that is a bug and it would be very helpful (for yourself and everyone else) if you could report it to the repository of the kernel you use. – krassowski Apr 07 '21 at 18:52
  • If you cannot get the nice widget working, you can always just use from `from tqdm import tqdm` instead of `from tqdm.auto import tqdm` – krassowski Apr 07 '21 at 18:59