4

There's are a few questions regarding this but I can't seem to find one that executes a progress bar on a non-iterable function. Below is a function that merges two separate data frames. I'm hoping to insert this function into a separate one that display the progress.

from multiprocessing import Pool
import tqdm
import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randint(0,100,size=(100000, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(0,100,size=(100000, 4)), columns=list('AXYZ'))

def merge_df(df1, df2):

    df = pd.merge(left = df1, right = df2, how = 'left',
    left_on = 'A', right_on = 'A')
    return df

if __name__ == '__main__':
   with Pool(2) as p:
      r = list(tqdm.tqdm(p.imap(merge_df, df1, df2)))

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

jonboy
  • 415
  • 4
  • 14
  • 45

2 Answers2

9
from tqdm import tqdm
import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randint(0,100,size=(100000, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(0,100,size=(100000, 4)), columns=list('AXYZ'))

#this is how you activate the pandas features in tqdm
tqdm.pandas()

#call the progress_apply feature with a dummy lambda 
df1.merge(df2).progress_apply(lambda x: x)

For the above code to work you have to have 4.33.0 version of tqdm. Uninstall old version and install new version using below command:

pip uninstall tqdm
pip install tqdm=='4.33.0'
Harish Vutukuri
  • 1,092
  • 6
  • 14
0

One possible solution is to split one of your dataframe in merge_df with:

dfs = np.split(df1, 100)   #or df2, depends of your merging left/right

and then use (inside merge_df) and adapt a progress bar (such as https://stackoverflow.com/a/34325723/4286380 proposed by @Greenstick...) to your problem...

Yassine
  • 51
  • 8