102

Is it possible to use TQDM progress bar when importing and indexing large datasets using Pandas?

Here is an example of of some 5-minute data I am importing, indexing, and using to_datetime. It takes a while and it would be nice to see a progress bar.

#Import csv files into a Pandas dataframes and convert to Pandas datetime and set to index

eurusd_ask = pd.read_csv('EURUSD_Candlestick_5_m_ASK_01.01.2012-05.08.2017.csv')
eurusd_ask.index = pd.to_datetime(eurusd_ask.pop('Gmt time'))
sslack88
  • 1,403
  • 3
  • 10
  • 15

5 Answers5

226

Find length by getting shape

for index, row in tqdm(df.iterrows(), total=df.shape[0]):
   print("index",index)
   print("row",row)
Arjun Kava
  • 5,303
  • 3
  • 20
  • 20
22
with tqdm(total=Df.shape[0]) as pbar:    
    for index, row in Df.iterrows():
        pbar.update(1)
        ...
Community
  • 1
  • 1
jmcgrath207
  • 1,317
  • 2
  • 19
  • 31
3

There is a workaround for tqdm > 4.24. As per https://github.com/tqdm/tqdm#pandas-integration:

from tqdm import tqdm
        
# Register `pandas.progress_apply` and `pandas.Series.map_apply` with `tqdm`
# (can use `tqdm_gui`, `tqdm_notebook`, optional kwargs, etc.)
tqdm.pandas(desc="my bar!")
eurusd_ask['t_stamp'] = eurusd_ask['Gmt time'].progress_apply(lambda x: pd.Timestamp)
eurusd_ask.set_index(['t_stamp'], inplace=True)
Miguel Trejo
  • 5,913
  • 5
  • 24
  • 49
Zeke Arneodo
  • 664
  • 7
  • 14
1

You could fill a pandas data frame in line by line by reading the file normally and simply add each new line as a new row to the dataframe, though this would be a fair bit slower than just using Pandas own reading methods.

ZeerakW
  • 323
  • 1
  • 3
  • 9
1

I find it very easy to implement. You only need to add the total argument.

import pandas as pd
df = pd.read_excel(PATH_TO_FILE)


for index, row in tqdm(df.iterrows(),  total=df.shape[0], desc=f'Reading DF'):
        print(row(['df_colum'])

85nd
  • 11
  • 5