I have a dask dataframe as below:
Column1 Column2 Column3 Column4 Column5
0 a 1 2 3 4
1 a 3 4 5
2 b 6 7 8
3 c 7 7
I want to merge all of the columns into a single one efficiently. And I want each row to be a single string. Like below:
Merged_Column
0 a,1,2,3,4
1 a,3,4,5
2 b,6,7,8
3 c,7,7,7
I've seen this question but it doesn't seem efficient since it is using the apply function. How can I achieve this as efficient as possible? (Speed + memory usage) Or is apply isn't as problematic as I believe since this is dask, not pandas.
This is what I tried. It seems like it is working but I am worried about the speed of it with the big dataframe.
cols= df.columns
df['combined'] = df[cols].apply(func=(lambda row: ' '.join(row.values.astype(str))), axis=1, meta=('str'))
df = df.drop(cols, axis=1)
I also need to get rid of the column header.