I have the following dataframe in Pyspark
+----+-------+-----+
|name|subject|score|
+----+-------+-----+
| Tom| math| 90|
| Tom|physics| 70|
| Amy| math| 95|
+----+-------+-----+
I used collect_list
and struct
function from pyspark.sql.functions
df.groupBy('name').agg(collect_list(struct('subject', 'score')).alias('score_list'))
to get the following dataframe
+----+--------------------+
|name| score_list|
+----+--------------------+
| Tom|[[math, 90], [phy...|
| Amy| [[math, 95]]|
+----+--------------------+
My question is how can I transform the last column score_list
into string and dump it into a csv file looks like
Tom (math, 90) | (physics, 70)
Amy (math, 95)
Appreciate for any help, thanks.
Update: Here is a similar question but it's not exactly the same because it goes directly from string
to another string
. In my case, I want to first transfer string
to collect_list<struct>
and finally stringify this collect_list<struct>
.