I have 10 functions and they all query against a DB and return a DF.
I can not execute them one after another as I join them at the end and if the timestamp doesn't match I will get null values. As I query large chunk of data so it takes time and so I want to run it in parallel.
def df1(domain,durarion):
do something
return df
def df2(domain,durarion):
do something
return df
def df3(domain,durarion):
do something
return df
def df4(domain,durarion):
do something
return df
def df5(domain,durarion):
do something
return df
def df6(domain,durarion):
do something
return df
def df7(domain,durarion):
do something
return df
def df8(domain,durarion):
do something
return df
def final_df(domain,duration):
df = pd.concat([df1(domain,duration),
df2(domain,duration),
df3(domain,duration),
df4(domain,duration),
df5(domain,duration),
df6(domain,duration),
df7(domain,duration),
df8(domain,duration)
],axis=1,sort=False).reset_index()
df = df.set_index('time')
return df
df = final_df(domain,duration)
I want to call all the 8 functions df1, df2, df3, df4, df5, df6, df7, df8
inside final_df
function in parallel.
P.S:- I am familiar with multiprocessing but I just don't want to run them in parallel but also to save its result.