Context
I have a data frame containing (what I think are) couples of (String, String)
.
It looks like this:
> df.show
| Col1 | Col2 |
| A | [k1, v1]|
| A | [k2, v2]|
> df.printSchema
|-- _1: string (nullable = true)
|-- _2: struct (nullable = true)
| |-- _1: string (nullable = true)
| |-- _2: string (nullable = true)
Col2
used to contain a Map[String, String]
on which I have done a toList()
and then explode()
to obtain one row per mapping present in the original Map.
Question
I would like to split Col2
into 2 columns and obtain this dataframe:
| Col1 | key | value |
| A | k1 | v1 |
| A | k2 | v2 |
Does anyone know how to do this?
Alternatively, Does anyone know how to explode+split a map into multiple rows (one per mapping) and 2 columns (one for key, one for value).
Thing I have tried / Error
I tried using the usually successful pattern with (String, String)
but this does not work:
df.select("Col1", "Col2").
map(r =>(r(0).asInstanceOf[String],
r(1).asInstanceOf[(String, String)](0),
r(1).asInstanceOf[(String, String)](1)
)
)
Caused by: java.lang.ClassCastException:
org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to scala.Tuple2
==> I guess the type of Col2 is org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
, could not find spark / scala doc for this.
And even if that worked, there would then be the issue that using indexes is not the right way to access elements of a tuple...
Thanks!