I need a tqdm
progress bar over a set of (possibly long) set of merge operations.
In my application, I have a set of operations in cascade like the following
data = data.merge(get_data_source1(), on="id", how="left")\
.merge(get_data_source2(), on="id", how="left")\
...
.merge(get_data_sourceN(), on="id", how="left")
It is not relevant what the get_data_source<i>
functions do, they pull the data from somewhere (for instance, from different files or different DBs) and they returns a DataFrame with an "id" column and that it takes a few seconds.
I would need a progress bar that goes with N. This is probably feasible encapsulating each merge operation within lambda
functions and put them into an iterable, but it looks like an overengineered and hard to read solution if I try to think of it (please correct me if you think I'm wrong).
Also, I'm aware that is possible to add a progress bar to each merge operation using the progress_apply
function (like reported here), but that would generate several (N) short progress bar rather than a single one.
For the sake of emulating a working setup, let's consider this toy example
import pandas as pd
import numpy as np
import time
data = pd.DataFrame(np.random.randint(0,100,size=(100,3)), columns=["id","A", "B"])
def get_data(col):
time.sleep(1.0)
return pd.DataFrame(np.random.randint(0,100,size=(100,2)), columns=["id",col])
data.merge(get_data("C"), on="id", how="left")\
.merge(get_data("D"), on="id", how="left")\
.merge(get_data("E"), on="id", how="left")\
.merge(get_data("F"), on="id", how="left")\
.merge(get_data("G"), on="id", how="left")\
.merge(get_data("H"), on="id", how="left")
What would the best way to approach the problem?