I would like to use gapply
according to https://spark.apache.org/docs/latest/sparkr.html#gapply
The problem is I am returning a list of 2 dataframes.
return(list(df1, df2))
How do I declare the output schema in this case?
I would like to use gapply
according to https://spark.apache.org/docs/latest/sparkr.html#gapply
The problem is I am returning a list of 2 dataframes.
return(list(df1, df2))
How do I declare the output schema in this case?
You cannot use function returning arbitrary list. As per gapply
documentation (emphasis mine):
The function func takes as argument a key - grouping columns and a data frame - a local R
data.frame
. The output of func is a localR data.frame
.
You might be make it work by treating each data.frame
as a single Row
of type equivalent to something struct<col1:array<typeofcol1>, col2:array<typeofcol2>, ..., coln:array<typeofcoln>>
, but only as long as both output data.frames
have identical schema.