Transpose pyspark rows into columns

Question

I'm trying to transpose some of my PySpark dataframe rows into columns

I've done many attempts but I can't seem to get the correct results.

Dataframe currently looks like this

ArticleID   |Category  |Value
1            Color      Black
1            Gender     Male
2            Color      Green
2            Gender     Female
3            Color      Blue
3            Gender     Male

Situation I'm trying to get is

ArticleID   |Color  |Gender
1            Black   Male
2            Green   Female
3            Blue    Male

Edit: Question might be the same in some areas but this one required an aggregation on first item for the pivoted row.

agg(f.first())

Suggested question could aggregate on numerical operations.

Possible duplicate of [How to pivot DataFrame?](https://stackoverflow.com/questions/30244910/how-to-pivot-dataframe) — pault, Apr 04 '19 at 14:28

score 4 · Accepted Answer · answered Apr 04 '19 at 14:07

Use groupBy + pivot:

import pyspark.sql.functions as f
df.groupBy('ArticleID').pivot('Category').agg(f.first('Value')).show()
+---------+-----+------+
|ArticleID|Color|Gender|
+---------+-----+------+
|        3| Blue|  Male|
|        1|Black|  Male|
|        2|Green|Female|
+---------+-----+------+

Transpose pyspark rows into columns

1 Answers1

Linked