1

I am trying to implement a progressing bar into my code while it reads my csv file (and I would like to implement it to the others functions too).

However, I am not sure how to implement this code to my reading code, because it stays progressing, and it never ends

import pandas as pd
from alive_progress import alive_bar
import time

with alive_bar(100, theme='ascii') as bar:

    file = pd.read_csv('file.csv', 
                        sep = ';', 
                        skiprows = 56,
                        parse_dates = [['Date','Time']])
    bar()

And, what happens if I would like to apply a progressing bar to a for loop?

Sultry T.
  • 69
  • 10
  • 1
    As far as I can tell, you need to call `bar()` every time you want to update the progress bar by 1. So if your `total` is 100 you would have to call `bar()` 100 times. In your example you are reading csv file and it is only 1 operation, so the progress bar will be updated only by 1. You can read csv file by chunks and for each chunk call `bar()` to update the progress bar. `total` would be the number of chunks. Here is how to read in chunks: https://stackoverflow.com/questions/25962114/how-do-i-read-a-large-csv-file-with-pandas – marke Oct 01 '21 at 12:57

2 Answers2

1

How do I add a progress bar to this?

In general with progress bars you need some way of adding a hook to the actual read loop. In this case I would simply not bother: if you're going to use a high-level library like pandas, presumably it's because you don't want to manage the whole reading-parsing loop yourself.

How do I use a for loop?

This is much easier. From the docs:

from alive_progress import alive_it

for item in alive_it(items):   # <<-- wrapped items
    print(item)                # process each item

Why doesn't my bar update?

Because you only call bar() once, which is the function which updates the bar. alive_progress isn't magic: if you tell it you will need 100 iterations it expects you to call bar() 100 times. It will move the bar 1/100th forward every time, and from the time between calls to bar() it will calculate how fast you are going and how long you likely have to wait.

2e0byo
  • 5,305
  • 1
  • 6
  • 26
1

You'd have to parse the file in chunks and get the number of lines beforehand to calculate the total number of chunks:

import pandas as pd
from alive_progress import alive_bar

filepath = "file.csv"

num_lines = sum(1 for _ in open(filepath, 'r'))
chunksize = 5000

reader = pd.read_csv(filepath, chunksize=chunksize)

with alive_bar(int(num_lines/chunksize)) as bar:
    for chunk in reader:
        process_chunk()
        bar()
        

The row counting wastes a lot of time of course, so I'd only recommend this if the processing takes much longer than the reading itself and you absolutely have to have a progress bar.

Lukas
  • 41
  • 3