Well, the question is pretty much that. Let me provide sample:
import org.apache.spark.sql.functions._
import org.apache.spark.sql.{DataFrame, Column, Dataset}
val data = List(
Row("miley",
Map("good_songs" -> "wrecking ball",
"bad_songs" -> "younger now"
)
),
Row("kesha",
Map(
"good_songs" -> "tik tok",
"bad_songs" -> "rainbow"
)
)
)
val schema = List(
StructField("singer", StringType, true),
StructField("songs", MapType(StringType, StringType, true))
)
val someDF = spark.createDataFrame(
spark.sparkContext.parallelize(data),
StructType(schema)
)
// This returns scala.collection.Map[Nothing,Nothing]
someDF.select($"songs").head().getMap(0)
// Therefore, this won't work:
val myHappyMap : Map[String, String] = someDF.select($"songs").head().getMap(0)
I don't understand why I'm getting a Map[Nothing, Nothing]
if I properly described my desired schema for the MapType column - not only that: when I do someDF.schema
, what I get is
org.apache.spark.sql.types.StructType = StructType(StructField(singer,StringType,true), StructField(songs,MapType(StringType,StringType,true),true))
, showing that the DataFrame schema is properly set.
I've read extract or filter MapType of Spark DataFrame
, and also How to get keys and values from MapType column in SparkSQL DataFrame
. I thought the latter would solve my problem by at least being able to extract the keys and the values separately, but, still, I get the values as WrappedArray(Nothing)
, which means it just adds extra complication for no real gain.
What am I missing here?