I am trying to union two Spark dataframes with different set of columns. For this purpose, I referred to following link :-
How to perform union on two DataFrames with different amounts of columns in spark?
My code is as follows -
val cols1 = finalDF.columns.toSet
val cols2 = df.columns.toSet
val total = cols1 ++ cols2
finalDF=finalDF.select(expr(cols1, total):_*).unionAll(df.select(expr(cols2, total):_*))
def expr(myCols: Set[String], allCols: Set[String]) = {
allCols.toList.map(x => x match {
case x if myCols.contains(x) => col(x)
case _ => lit(null).as(x)
})
}
But the problem I am facing is some of the columns in both dataframes are nested. I've columns of both StructType and primitive types. Now, say column A (of StructType) is in df and not in finalDF. But in expr,
case _ => lit(null).as(x)
is not making it StructType. That's why I am not able to union them. It is giving me following error -
org.apache.spark.sql.AnalysisException: Union can only be performed on tables with the compatible column types. NullType <> StructType(StructField(_VALUE,StringType,true), StructField(_id,LongType,true)) at the first column of the second table.
Any suggestions what I can do here ?