0

I would like to use gapply according to https://spark.apache.org/docs/latest/sparkr.html#gapply

The problem is I am returning a list of 2 dataframes.

return(list(df1, df2))

How do I declare the output schema in this case?

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
bhomass
  • 3,414
  • 8
  • 45
  • 75

1 Answers1

0

You cannot use function returning arbitrary list. As per gapply documentation (emphasis mine):

The function func takes as argument a key - grouping columns and a data frame - a local R data.frame. The output of func is a local R data.frame.

You might be make it work by treating each data.frame as a single Row of type equivalent to something struct<col1:array<typeofcol1>, col2:array<typeofcol2>, ..., coln:array<typeofcoln>>, but only as long as both output data.frames have identical schema.