My code is like this:
dataFrame.createOrReplaceTempView("tableName1")
//udf1 and udf2 will return a struct
var dataFrame1 = sql("select udf1(a) as col1,udf2(a) as col2 from tableName1");
dataFrame1.createOrReplaceTempView("tableName2")
var dataFrame2 = sql("select col1.x,col1.y,col1.z,col2.x,col2.y,col2.z from tableName2");
dataFrame2.write.parquet("path")
I see the physical plan in Spark UI is like this:
select udf1(a).x,udf1(a).y,udf1(a).z,udf2.x,udf2.y,udf2.z from tableName2
It seems like that the udf1
and udf2
will be invoked 3 times。
What I really want is that udf1
and udf2
only compute once。
Hope some can help me.