I have number of dataframes
that created inside a loop and I want to union all these dataframes
. I tried to create final dataframe' that should contains all other small
dataframes, but it seams this not working because the union will hold only the last small
dataframes`. I read this similar question and the answer that provided by @zero323 the solution that has been suggested works fine when I do it in shell:
scala> val a= sql("""select "1" as k""")
a: org.apache.spark.sql.DataFrame = [k: string]
scala> val b= sql("""select "2" as k""")
b: org.apache.spark.sql.DataFrame = [k: string]
scala> val c= sql("""select "3" as k""")
c: org.apache.spark.sql.DataFrame = [k: string]
scala> a.show
+---+
| k|
+---+
| 1|
+---+
scala> b.show
+---+
| k|
+---+
| 2|
+---+
scala> c.show
+---+
| k|
+---+
| 3|
+---+
Now to join the above three dataframes
I did the following:
scala> val g = Seq(a,b,c)
g: Seq[org.apache.spark.sql.DataFrame] = List([k: string], [k: string], [k: string])
scala> val s = g.reduce(_ union _)
s: org.apache.spark.sql.DataFrame = [k: string]
scala> s.show
+---+
| k|
+---+
| 1|
| 2|
| 3|
+---+
The problem
Now I am trying to do same thing on Eclipse
val g = Seq()
val dummyDf = ss.sql(s"select 0 as ss , a.* from table1 limit 1")
for (element <- 0 to arr.size-1) {
var strt: Int = arr.toList(element )
var nd: Int = arr.toList(element + 1)
val tempDF = ss.sql(s"select $strt as ss , a.* from table1 a where rnk between $strt+1 and $nd-1")
g :+ tempDF
}
val finalDf = g.reduce(_ union _)
but I got the following error message:
Multiple markers at this line:
◾missing parameter type for expanded function ((x$14: , x$15) ⇒ x$14.union(x$15))
◾identifier expected but '_' found.
◾missing parameter type for expanded function ((x$14: , x$15: ) ⇒ x$14.union(x$15))
Any help with this is highly appreciated
Edit:
For the other solution that suggested in the link that I referred:
dfs match {
case h :: Nil => Some(h)
case h :: _ => Some(h.sqlContext.createDataFrame(
h.sqlContext.sparkContext.union(dfs.map(_.rdd)),
h.schema
))
case Nil => None
}
where can I find the resulted union final dataframe
? I ran it and the compilation went correctly, but I can not access the resulted dataframe