I created a PySpark dataframe using the following code
testlist = [
{"category":"A","name":"A1"},
{"category":"A","name":"A2"},
{"category":"B","name":"B1"},
{"category":"B","name":"B2"}
]
spark_df = spark.createDataFrame(testlist)
Result:
category name
A A1
A A2
B B1
B B2
I want to make it appear as follows:
category name
A A1, A2
B B1, B2
I tried the following code which does not work
spark_df.groupby('category').agg('name', lambda x:x + ', ')
Can anyone help identify what I am doing wrong and the best way to make this happen ?