Similar unanswered question: Row by row processing of a Dask DataFrame
I'm working with dataframes that are millions on rows long, and so now I'm trying to have all dataframe operations performed in parallel. One such operation I need converted to Dask is:
for row in df.itertuples():
ratio = row.ratio
tmpratio = row.tmpratio
tmplabel = row.tmplabel
if tmpratio > ratio:
df.loc[row.Index,'ratio'] = tmpratio
df.loc[row.Index,'label'] = tmplabel
What is the appropriate way to set a value by index in Dask, or conditionally set values in rows? Given that .loc
doesn't support item assignment in Dask, there does not appear to be a set_value
, at[]
, or iat[]
in Dask either.
I have attempted to use map_partitions with assign, but I am not seeing any ability to perform conditional assignment at the row-level.