I have a python program that crunches a large dataset using Pandas. It currently takes about 15 minute to complete. I want to log (stdout & send the metric to Datadog) about the progress of the task. Is there a way to get the %-complete of the task (or a function)? In the future, I might be dealing with larger datasets. The Python task that I am doing is a simple grouping of a large pandas data frame. Something like this:
dfDict = {}
for cat in categoryList:
df1 = df[df['category'] == cat]
if len(df1.index) > 0:
df1[dateCol] = pd.to_datetime(df[dateCol])
dfDict[cat] = df1
here, the categoryList has about 20000 items, and df is a large data frame having (say) a 5 million rows.
I am not looking for anything fancy (like progress-bars..). Just percentage complete value. Any ideas?
Thanks!