0

I have a dataframe with single row and multiple columns. I would like it to convert it into multiple rows. I had found a similar question here on the stackoverflow.

The question answers how it can be done in scala but I wanted to do this in pyspark. I tried to replicate the code in pyspark but I wasn't able to do that.

I am not able to convert the below code in scala to python:

import org.apache.spark.sql.Column
var ColumnsAndValues: Array[Column] = df.columns.flatMap { c => {Array(lit(c), col(c))}}
val df2 = df1.withColumn("myMap", map(ColumnsAndValues: _*))
blackbishop
  • 30,945
  • 11
  • 55
  • 76
Nikunj Kakadiya
  • 2,689
  • 2
  • 20
  • 35

1 Answers1

1

In Pyspark you can use create_map function to create map column. And a list comprehension with itertools.chain to get the equivalent of scala flatMap :

import itertools
from pyspark.sql import functions as F

columns_and_values = itertools.chain(*[(F.lit(c), F.col(c)) for c in df1.columns])
df2 = df1.withColumn("myMap", F.create_map(*columns_and_values))
blackbishop
  • 30,945
  • 11
  • 55
  • 76