3

I could find examples of tqdm progress bar being used for group by and other pandas operations. But couldn't find anything on merge or join.

Is it possible to use tqdm on pandas for merge ?

rahul
  • 1,133
  • 12
  • 17
  • 1
    @johny Mudly I've already seen this question. There is no example of pandas merge/join operation in any of the answers. – rahul May 22 '19 at 12:55

1 Answers1

8

tqdm supports pandas and various operations within it. For merging two large dataframes and showing the progress, you could do it this way:

import pandas as pd
from tqdm import tqdm

df1 = pd.DataFrame({'lkey': 1000*['a', 'b', 'c', 'd'],'lvalue': np.random.randint(0,int(1e8),4000)})
df2 = pd.DataFrame({'rkey': 1000*['a', 'b', 'c', 'd'],'rvalue': np.random.randint(0, int(1e8),4000)})

#this is how you activate the pandas features in tqdm
tqdm.pandas()
#call the progress_apply feature with a dummy lambda 
df1.merge(df2, left_on='lkey', right_on='rkey').progress_apply(lambda x: x)

More details are available on this thread: Progress indicator during pandas operations (python)

HMReliable
  • 871
  • 5
  • 11
  • 2
    I think, it just shows the progress of the apply function, rather than the actual merge operation. – Kapil Feb 25 '20 at 09:38
  • The only possible way which I found is described here (it's using Dask as a workaround): https://stackoverflow.com/a/68936833/3921758 – DataScientYst Aug 26 '21 at 10:40