-1

Sorting in Dask

based on this answer I want to build the combined column dynamically

df_post['sort_column'] = df_post.apply(lambda r:str([r[col1],r[col2],r[col3]]), axis=1)
df_post = df_post.set_index('sort_column')
df_post = df_post.map_partitions(lambda x: x.sort_index())

I am not able to figure out a way to make this '[r[col1],r[col2],r[col3]]' dynamic based on a list of columns provided by config file.

1 Answers1

0

It is tricky to tell what the question is after, but assuming it is "I would like to apply the solution in a the linked answer, but for a list of column names". This can look like

df_post['sort_column'] = df_post.apply(lambda r:str([r[c] for c in columns]), axis=1)
df_post = df_post.set_index('sort_column')
df_post = df_post.map_partitions(lambda x: x.sort_index())

where columns has been obtained from the config file beforehand.

mdurant
  • 27,272
  • 5
  • 45
  • 74