I am using read_csv() to read a long list of csv files and return two dataframes. I have managed to speed up this action by using dask. Unfortunately, I have not been able to return multiple variables when using dask.
The minimum working example below replicates my issue:
@delayed(nout = 2)
def function(a):
d = 0
c = a + a
if a>4: # random condition to make c and d of different lenghts
d = a * a
return pd.DataFrame([c])#, pd.DataFrame([d])
list = [1,2,3,4,5]
dfs = [delayed(function)(int) for int in list]
ddf = dd.from_delayed(dfs)
ddf.compute()
Any ideas to resolve this issue is appreciated. Thanks.