Pivot a spark dataframe without a groupBy column

Question

Let's say I've a dataframe something like this,

+---+----+------+
|id |name|salary|
+---+----+------+
|10 |abc |100   |
+---+----+------+

And I would like to pivot/transpose the data so that the output looks like,

+--------+----+
|col_name|data|
+--------+----+
|id      |10  |
|name    |abc |
|salary  |100 |
+--------+----+

How would I do this using pyspark.

score 1 · Accepted Answer · answered Aug 19 '20 at 05:51

1

You can use stack as

s = ','.join([f"'{i}', `{i}`" for i in df.columns])
df = df.select([col(i).cast('string') for i in df.columns])
df.select(expr(f'''stack({len(df.columns)},{s})''')).show()

+------+----+
|  col0|col1|
+------+----+
|    id|  10|
|  name| abc|
|salary| 100|
+------+----+

answered Aug 19 '20 at 05:51

Shubham Jain

5,327
2
15
38

Thanks Shubham. This works but I want to ask why do you have to cast the columns to string? – 2713 Aug 19 '20 at 15:25
1

Stack requires all columns as string – Shubham Jain Aug 19 '20 at 17:32

score 0 · Answer 2 · answered Aug 18 '20 at 21:55

0

I'm not aware of a spark function that does that. You could use expr(stack(...)) or do something similar to this.

answered Aug 18 '20 at 21:55

jayrythium

679
4
11

Pivot a spark dataframe without a groupBy column

2 Answers2