2

Well, the question is pretty much that. Let me provide sample:

import org.apache.spark.sql.functions._
import org.apache.spark.sql.{DataFrame, Column, Dataset}

val data = List(
  Row("miley", 
      Map("good_songs" -> "wrecking ball",
          "bad_songs" -> "younger now"
         )
     ),
    Row("kesha",
        Map(
          "good_songs" -> "tik tok",
          "bad_songs" -> "rainbow"
        )
       )
)

val schema = List(
  StructField("singer", StringType, true),
  StructField("songs", MapType(StringType, StringType, true))
)

val someDF = spark.createDataFrame(
  spark.sparkContext.parallelize(data),
  StructType(schema)
)


// This returns scala.collection.Map[Nothing,Nothing]
someDF.select($"songs").head().getMap(0) 

// Therefore, this won't work:
val myHappyMap : Map[String, String] = someDF.select($"songs").head().getMap(0)

I don't understand why I'm getting a Map[Nothing, Nothing] if I properly described my desired schema for the MapType column - not only that: when I do someDF.schema, what I get is org.apache.spark.sql.types.StructType = StructType(StructField(singer,StringType,true), StructField(songs,MapType(StringType,StringType,true),true)), showing that the DataFrame schema is properly set.

I've read extract or filter MapType of Spark DataFrame , and also How to get keys and values from MapType column in SparkSQL DataFrame . I thought the latter would solve my problem by at least being able to extract the keys and the values separately, but, still, I get the values as WrappedArray(Nothing), which means it just adds extra complication for no real gain.

What am I missing here?

Lucas Lima
  • 832
  • 11
  • 23

1 Answers1

2

.getMap is a typed method and it's incapable of infering the types on your map, so you have to actually tell it:

val myHappyMap: Map[String, String] = someDF.select($"songs").head().getMap[String, String](0).toMap

the toMap in the end is just to convert it from scala.collection.Map to scala.collection.immutable.Map (they are different stuff and when you declare the type usually you are refering to the second one) (edited)