I want to load a struct
from a database collection, and attach it as a constant column to every row in a target DataFrame
.
I can load the column I need as a DataFrame
with one row, then do a crossJoin
to paste it onto each row of the target:
val parentCollectionDF = /* ... load a single row from the database */
val constantCol = broadcast(parentCollectionDF.select("my_column"))
val result = childCollectionDF.crossJoin(constantCol)
It works but feels wasteful: the data is constant for each row of the child collection, but the crossJoin copies it to each row.
If I could hardcode the values, I could use something like childCollection.withColumn("my_column", struct(lit(val1) as "field1", lit(val2) as "field2" /* etc. */))
But I don't know them ahead of time; I need to load the struct from the parent collection.
What I'm looking for is something like:
childCollection.withColumn("my_column",
lit(parentCollectionDF.select("my_column").take(1).getStruct(0))
... but I can see from the code for literals that only basic types can be used as an argument to lit()
. No good to pass a GenericRowWithSchema or a case class here.
Is there a less clumsy way to do this? (Spark 2.1.1, Scala)
[edit: Not the same as this question, which explains how to add a struct with literal (hardcoded) constants. My struct needs to be loaded dynamically.]