1

I have this dataframe below, and I need to get basically one row with all the marks fields concatenated with a delimiter like pipe.
So: PACKAGING MARKS 3|PACKAGING MARKS 2|PACKAG.....

And there can be varying amounts of marks records for each mid.

mid marksId id index marks
2 3 3 2 PACKAGING MARKS 3
2 3 3 1 PACKAGING MARKS 2
2 3 3 0 PACKAGING MARKS 1
2 4 4 2 PACKAGING MARKS 23
2 4 4 1 PACKAGING MARKS 22
2 4 4 0 PACKAGING MARKS 21

Thanks

Ron
  • 195
  • 1
  • 2
  • 10

1 Answers1

1

Assuming you want 1 delimited string for each "mid", you can collect all "marks" with collect_list() and use concat_ws() to create the string:

import pyspark.sql.functions as F

df.groupby('mid').agg(F.concat_ws('|', F.collect_list('marks')).alias('marks_str')).show(truncate=False)
bzu
  • 1,242
  • 1
  • 8
  • 14