0

I'm looking to transpose a small dataframe so that the the columns become rows

For example suppose I have a dataframe like this

+---+---+------+
| id|obs|period|
+---+---+------+
|  1|230|  CURR|
|  2|456|  PREV|
+---+---+------+

I would like to have

+---------+-----+----+
|COL_NAME | CURR|PREV|
+---------+-----+----+
|id       |   1 | 2  |
|obs      |  230|456 |
+---------|-----|----+

Any help greatly appreciated. Closest I got was this from the web

from pyspark.sql import functions as func
#Use `create_map` to create the map of columns with constant 
df = df.withColumn('mapCol', \
                    func.create_map(func.lit('period'),df.period,
                                    func.lit('col_2'),df.id,
                                    func.lit('col_3'),df.obs
                                   ) 
                  )
#Use explode function to explode the map 
res = df.select(func.explode(df.mapCol).alias('col_id','col_value'))
res.show()

+------+---------+
|col_id|col_value|
+------+---------+
|period|     CURR|
| col_2|        1|
| col_3|      230|
|period|     PREV|
| col_2|        2|
| col_3|      456|
+------+---------+
user2699504
  • 195
  • 1
  • 4
  • 18
  • 1
    Possible duplicate of [How to pivot Spark DataFrame?](https://stackoverflow.com/questions/30244910/how-to-pivot-spark-dataframe) – pault Oct 23 '19 at 15:17
  • I don't want an aggregate so it's not a duplicate – user2699504 Oct 23 '19 at 16:03
  • 1
    Here's another link: [Pyspark: reshape data without aggregation](https://stackoverflow.com/questions/53103269/pyspark-reshape-data-without-aggregation) - and there are multiple links at the bottom of the original linked duplicate which should cover your use case, for example: [Transpose column to row with Spark](https://stackoverflow.com/questions/37864222/transpose-column-to-row-with-spark), [Spark: Transpose DataFrame Without Aggregating](https://stackoverflow.com/questions/40892459/spark-transpose-dataframe-without-aggregating) – pault Oct 23 '19 at 16:05
  • 1
    Also just this example isn't exactly a good use case for spark (maybe you just made it up). Could this be an [XY Problem](http://www.xyproblem.info)? What is your ultimate goal? There may be a different (better) approach. For example - if your data is small, maybe you'd be better off collecting to `pandas` and doing the transpose outside of spark. – pault Oct 23 '19 at 16:21

1 Answers1

0

Here is the answer I came up, thanks to all who tried to help.

spark.sql("select 'ID' as COL_NAME  ,max(case when period = 'CURR' then id end) as CURR, \
max(case when period = 'PREV' then id end) as PREV from df union \
select 'OBS' ,max(case when period = 'CURR' then obs end),max(case when period = 'PREV' then obs end) from df")\
.show()

+--------+----+----+
|COL_NAME|CURR|PREV|
+--------+----+----+
|   ID   |   1|   2|
|  OBS   | 230| 456|
+--------+----+----+
user2699504
  • 195
  • 1
  • 4
  • 18