0

I am looking to transpose each record into multiple columns in pyspark dataframe.

This is my dataframe:

+--------+-------------+--------------+------------+------+
|level_1 |level_2      |level_3       |level_4     |UNQ_ID|
+--------+-------------+--------------+------------+------+
|D  Group|Investments  |ORB           |ECM         |1     |
|E  Group|Investment   |Origination   |Execution   |2     |
+--------+-------------+--------------+------------+------+

Required dataframe is:

+--------+---------------+------+
|level   |name           |UNQ_ID|
+--------+---------------+------+
|level_1 |D  Group       |1     |
|level_1 |E  Group       |2     |
|level_2 |Investments    |1     |
|level_2 |Investment     |2     |
|level_3 |ORB            |1     |
|level_3 |Origination    |2     |
|level_4 |ECM            |1     |
|level_4 |Execution      |2     |
+--------+---------------+------+
anky
  • 74,114
  • 11
  • 41
  • 70
santhosh
  • 39
  • 1
  • 5
  • Does this answer your question? [How to melt Spark DataFrame?](https://stackoverflow.com/questions/41670103/how-to-melt-spark-dataframe) – anky Aug 24 '21 at 10:14
  • Hi need help to get the required dataframe format – santhosh Aug 24 '21 at 10:20
  • [This answer](https://stackoverflow.com/a/41673644/9840637) shows exactly that. please refer the same and let us know if you are facing issues with this. There are examples as well to show how it works – anky Aug 24 '21 at 10:25
  • ya thanks for your quick help....its working for me – santhosh Aug 24 '21 at 11:22
  • @anky That answer is outdated. [Stack](https://stackoverflow.com/questions/42465568/unpivot-in-spark-sql-pyspark) is the current function to use. – Kafels Aug 24 '21 at 11:34
  • @Kafels duplicate list should be edited then..!! – anky Aug 24 '21 at 11:38

1 Answers1

1

The easier way using stack function:

import pyspark.sql.functions as f

output_df = df.selectExpr('stack(4, "level_1", level_1, "level_2", level_2, "level_3", level_3, "level_4", level_4) as (level, name)', 'UNQ_ID')
output_df.show()

# +-------+-----------+------+
# |  level|       name|UNQ_ID|
# +-------+-----------+------+
# |level_1|    D Group|     1|
# |level_2|Investments|     1|
# |level_3|        ORB|     1|
# |level_4|        ECM|     1|
# |level_1|    E Group|     2|
# |level_2|Investments|     2|
# |level_3|Origination|     2|
# |level_4|  Execution|     2|
# +-------+-----------+------+
Kafels
  • 3,864
  • 1
  • 15
  • 32