I have data in a parquet file which has 2 fields: object_id: String
and alpha: Map<>
.
It is read into a data frame in sparkSQL and the schema looks like this:
scala> alphaDF.printSchema()
root
|-- object_id: string (nullable = true)
|-- ALPHA: map (nullable = true)
| |-- key: string
| |-- value: struct (valueContainsNull = true)
I am using Spark 2.0 and I am trying to create a new data frame in which columns need to be object_id
plus keys of the ALPHA
map as in object_id, key1, key2, key2, ...
I was first trying to see if I could at least access the map like this:
scala> alphaDF.map(a => a(0)).collect()
<console>:32: error: Unable to find encoder for type stored in a Dataset.
Primitive types (Int, String, etc) and Product types (case classes) are
supported by importing spark.implicits._ Support for serializing other
types will be added in future releases.
alphaDF.map(a => a(0)).collect()
but unfortunately I can't seem to be able to figure out how to access the keys of the map.
Can someone please show me a way to get the object_id
plus map keys as column names and map values as respective values in a new dataframe?