Suppose we have a DataFrame
with a column of map
type.
val df = spark.sql("""select map("foo", 1, "bar", 2) AS mapColumn""")
df.show()
// +--------------------+
// | mapColumn|
// +--------------------+
// |{foo -> 1, bar -> 2}|
// +--------------------+
What is the most straightforward way to convert it to a struct
(or, equivalently, define a new column with the same keys and values but as a struct
type)? See the following spark-shell
(2.4.5) session, for an insanely inefficient way of going about it:
val jsonStr = df.select(to_json($"mapColumn")).collect()(0)(0).asInstanceOf[String]
spark.read.json(Seq(jsonStr).toDS()).show()
// +---+---+
// |bar|foo|
// +---+---+
// | 2| 1|
// +---+---+
Now, obviously collect()
is very inefficient, and this is generally an awful way to do things in Spark. But what is the preferred way to accomplish this conversion? named_struct
and struct
both take a sequence of parameter values to construct the results, but I can't find any way to "unwrap" the map key/values to pass them to these functions.