0

I created an empty Seq() using

scala> var x = Seq[DataFrame]()
x: Seq[org.apache.spark.sql.DataFrame] = List()

I have a function called createSamplesForOneDay() that returns a DataFrame, which I would like to add to this Seq() x .

val temp = createSamplesForOneDay(some_inputs) // this returns a Spark DF
x = x + temp // this throws an error 

I get the below error -

scala> x = x + temp
<console>:59: error: type mismatch;
 found   : org.apache.spark.sql.DataFrame
    (which expands to)  org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
 required: String
       x = x + temp

What I am trying to do is create a Seq() of dataframes using a for loop and at the end union them all using something like this -

val newDFs = Seq(DF1,DF2,DF3)
newDFs.reduce(_ union _)

as mentioned here - scala - Spark : How to union all dataframe in loop

Regressor
  • 1,843
  • 4
  • 27
  • 67

1 Answers1

2

you cannot append to a List using +, you can append like this :

x = x :+ temp

But as you have a List, you should rather prepend your elements:

x = temp +: x 

Instead of adding elements one by one, you can write it more functional if you pack your inputs in a sequence too:

val inputs = Seq(....) // create Seq of inputs

val x = inputs.map(i => createSamplesForOneDay(i))
Raphael Roth
  • 26,751
  • 15
  • 88
  • 145
  • thank you for the answer. The first part of it works like a charm. For the second part of making the code more functional, what I am trying to do is, create a DataFrame for each day one by one and store it somewhere. Hence, the number of elements in the Seq() is equal to the number of days or number of times the loop runs. – Regressor Jul 02 '19 at 07:46