0

I'm trying to transpose some of my PySpark dataframe rows into columns

I've done many attempts but I can't seem to get the correct results.

Dataframe currently looks like this

ArticleID   |Category  |Value
1            Color      Black
1            Gender     Male
2            Color      Green
2            Gender     Female
3            Color      Blue
3            Gender     Male

Situation I'm trying to get is

ArticleID   |Color  |Gender
1            Black   Male
2            Green   Female
3            Blue    Male

Edit: Question might be the same in some areas but this one required an aggregation on first item for the pivoted row.

agg(f.first())

Suggested question could aggregate on numerical operations.

FFGH
  • 121
  • 12
  • Possible duplicate of [How to pivot DataFrame?](https://stackoverflow.com/questions/30244910/how-to-pivot-dataframe) – pault Apr 04 '19 at 14:28

1 Answers1

4

Use groupBy + pivot:

import pyspark.sql.functions as f
df.groupBy('ArticleID').pivot('Category').agg(f.first('Value')).show()
+---------+-----+------+
|ArticleID|Color|Gender|
+---------+-----+------+
|        3| Blue|  Male|
|        1|Black|  Male|
|        2|Green|Female|
+---------+-----+------+
Psidom
  • 209,562
  • 33
  • 339
  • 356