-1

I have a dataframe as below now I need to transpose the data output as key-Value pair. Where Key being the ColumnName and Value as the columnsValue.

+---+----+------+-----+
|age| dob|gender| name|
+---+----+------+-----+
| 25|1991|     M|Ankit|
+---+----+------+-----+

Required Output

+-------+-------+
|Key    |Value  |
+-------+-------+
|age    |25     |
|dob    |1991   |
|gender |M      |
|name   |Ankit  |
+-------+-------+

I tried using the sample code given in the following link https://codereview.stackexchange.com/questions/200391/pyspark-code-that-turns-columns-into-rows

But this gives me an error as below,

cPickle.PicklingError: Could not serialize object: Py4JError: An error occurred while calling o149.__getnewargs__. Trace:
py4j.Py4JException: Method __getnewargs__([]) does not exist
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
    at py4j.Gateway.invoke(Gateway.java:274)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

Any help on this aspect would be really helpful.

DataWrangler
  • 1,804
  • 17
  • 32
  • Possible duplicate of [How to melt Spark DataFrame?](https://stackoverflow.com/questions/41670103/how-to-melt-spark-dataframe) – user10938362 May 21 '19 at 15:26
  • Possible duplicate of [Transpose column to row with Spark](https://stackoverflow.com/questions/37864222/transpose-column-to-row-with-spark) – pault May 21 '19 at 15:29
  • @pault had tried the same too but it didn't seem to work, the answer provided by you below worked like charm. Thanks a ton – DataWrangler May 22 '19 at 08:46
  • @user10938362 had tried that piece of code too but didnt get the expected output, if you can explain me the code used in that link would be really helpful – DataWrangler May 22 '19 at 08:49

1 Answers1

2

Another option in this case would be create a MapType of your columns and explode:

from itertools import chain
from pyspark.sql.functions import col, create_map, explode, lit

df.select(
    explode(create_map(*chain.from_iterable([(lit(c), col(c)) for c in df.columns])))
).show()
#+------+-----+
#|   key|value|
#+------+-----+
#|   age|   25|
#|   dob| 1991|
#|gender|    M|
#|  name|Ankit|
#+------+-----+
pault
  • 41,343
  • 15
  • 107
  • 149
  • Single line code, works wonders :) Thanks. If you don't mind, can you please explain what's happening within the single line code. – DataWrangler May 22 '19 at 08:47
  • @Joby `create_map` takes in an even number of arguments, which are alternating keys and values. The list comprehension creates tuples of the literal value of the column (`lit(c)`) and the column value `col(c)` for every column in your dataframe. `chain.from_iterable` flattens this nested structure, and the `*` operator is used to unpack the arguments to pass them to `create_map`. Finally use `explode` to turn the `MapType` into rows of `key` and `value`. – pault May 22 '19 at 19:09