I created an empty Seq() using
scala> var x = Seq[DataFrame]()
x: Seq[org.apache.spark.sql.DataFrame] = List()
I have a function called createSamplesForOneDay()
that returns a DataFrame, which I would like to add to this Seq() x
.
val temp = createSamplesForOneDay(some_inputs) // this returns a Spark DF
x = x + temp // this throws an error
I get the below error -
scala> x = x + temp
<console>:59: error: type mismatch;
found : org.apache.spark.sql.DataFrame
(which expands to) org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
required: String
x = x + temp
What I am trying to do is create a Seq()
of dataframes using a for loop and at the end union
them all using something like this -
val newDFs = Seq(DF1,DF2,DF3)
newDFs.reduce(_ union _)
as mentioned here - scala - Spark : How to union all dataframe in loop