-1

So basically I have an input data frame as below

https://i.stack.imgur.com/QfcjV.png

which I want to transform into below output

enter image description here

Can anyone please help me as to ho we can implement this using PySpark Dataframes ?

I tried different ways but could not find an optimal way to do the same

arthurq
  • 319
  • 1
  • 6

1 Answers1

0

Do a groupby on common columns and collect the column with distinct values into a list.

import pyspark.sql.functions as F

ans_df =  df.groupBy(F.col('HCP ID'), F.col('TERR ID')).agg(collect_list(F.col('PRODUCT')).alias("LINEUP"))
user238607
  • 1,580
  • 3
  • 13
  • 18