Spark - convert Map to a single-row DataFrame

Question

In my application I have a need to create a single-row DataFrame from a Map.

So that a Map like

("col1" -> 5, "col2" -> 10, "col3" -> 6)

would be transformed into a DataFrame with a single row and the map keys would become names of columns.

col1 | col2 | col3
5    | 10   | 6

In case you are wondering why would I want this - I just need to save a single document with some statistics into MongoDB using MongoSpark connector which allows saving DFs and RDDs.

Are the keys ordered, or do you want to sort them alphabetically? — Andrey Tyukin, Mar 20 '18 at 14:03

Andrey Tyukin · Answer 1 · 2018-03-20T14:29:37.440

I thought that sorting the column names doesn't hurt anyway.

  import org.apache.spark.sql.types._
  val map = Map("col1" -> 5, "col2" -> 6, "col3" -> 10)
  val (keys, values) = map.toList.sortBy(_._1).unzip
  val rows = spark.sparkContext.parallelize(Seq(Row(values: _*)))
  val schema = StructType(keys.map(
    k => StructField(k, IntegerType, nullable = false)))
  val df = spark.createDataFrame(rows, schema)
  df.show()

Gives:

+----+----+----+
|col1|col2|col3|
+----+----+----+
|   5|   6|  10|
+----+----+----+

The idea is straightforward: convert map to list of tuples, unzip, convert the keys into a schema and the values into a single-entry row RDD, build dataframe from the two pieces (the interface for createDataFrame is a bit strange there, accepts java.util.Lists and kitchen sinks, but doesn't accept the usual scala List for some reason).

I'm using scala 2.11 and (I think) as such, in the above map.toList.sortBy(_._1).unzip does not compile: toList is not a member of map, ._1 is not a number.... any idea how to fix this? — David Urry, Nov 13 '19 at 04:18

Raphael Roth · Accepted Answer · 2018-03-20T14:20:28.160

1

here you go :

val map: Map[String, Int] = Map("col1" -> 5, "col2" -> 6, "col3" -> 10)

val df = map.tail
  .foldLeft(Seq(map.head._2).toDF(map.head._1))((acc,curr) => acc.withColumn(curr._1,lit(curr._2)))


df.show()

+----+----+----+
|col1|col2|col3|
+----+----+----+
|   5|   6|  10|
+----+----+----+

edited Mar 20 '18 at 14:20

answered Mar 20 '18 at 14:11

Raphael Roth

26,751
15
88
145

score 0 · Answer 3 · answered Dec 18 '20 at 18:41

A slight variation to Rapheal's answer. You can create a dummy column DF (1*1), then add the map elements using foldLeft and then finally delete the dummy column. That way, your foldLeft is straight forward and easy to remember.

val map: Map[String, Int] = Map("col1" -> 5, "col2" -> 6, "col3" -> 10)

val f = Seq("1").toDF("dummy")

map.keys.toList.sorted.foldLeft(f) { (acc,x) => acc.withColumn(x,lit(map(x)) ) }.drop("dummy").show(false)

+----+----+----+
|col1|col2|col3|
+----+----+----+
|5   |6   |10  |
+----+----+----+

Spark - convert Map to a single-row DataFrame

3 Answers3

Linked