I have four data frames in Spark Scala (Spark version: 2.3 and Spark-sql: 2.11 and Scala version: 2.11.0) such as:
ratingsDf
+-------+---+
|ratings| id|
+-------+---+
| 0| 1|
| 1| 2|
| 1| 3|
| 0| 4|
| 0| 5|
| 1| 6|
| 1| 7|
| 1| 8|
| 0| 9|
| 1| 10|
+-------+---+
GpredictionsDf
+-----------+---+
|gprediction| id|
+-----------+---+
| 0| 1|
| 1| 2|
| 1| 3|
| 1| 4|
| 1| 5|
| 1| 6|
| 1| 7|
| 1| 8|
| 0| 9|
| 1| 10|
+-----------+---+
RpredictionsDf
+-----------+---+
|rprediction| id|
+-----------+---+
| 0| 1|
| 1| 2|
| 1| 3|
| 1| 4|
| 1| 5|
| 1| 6|
| 1| 7|
| 1| 8|
| 1| 9|
| 1| 10|
+-----------+---+
LpredictionsDf
+-----------+---+
|lprediction| id|
+-----------+---+
| 0| 1|
| 1| 2|
| 1| 3|
| 0| 4|
| 1| 5|
| 1| 6|
| 1| 7|
| 1| 8|
| 0| 9|
| 1| 10|
+-----------+---+
I need to create a DataFrame by joining all four tables on "id" column. I tried below two ways to do this:
**Method 1: **
val ensembleDf = GpredictionsDf.join(rpredjoin, gpredjoin("id") === RpredictionsDf("id"))
.join(LpredictionsDf, LpredictionsDf("id") === RpredictionsDf("id"))
.join(ratingsDf, ratingsDf("id") === RpredictionsDf("id"))
.select("gprediction", "rprediction", "lprediction", "ratings")
**Method 2: **
ratingsDf.createOrReplaceTempView("ratingjoin");
GpredictionsDf.createOrReplaceTempView("gpredjoin")
RpredictionsDf.createOrReplaceTempView("rpredjoin")
LpredictionsDf.createOrReplaceTempView("lpredjoin")
val ensembleDf = sqlContext.sql("SELECT gprediction, rprediction, lprediction, ratings FROM gpredjoin, rpredjoin, lpredjoin, ratingjoin WHERE " +
"gpredjoin.id = rpredjoin.id AND rpredjoin.id = lpredjoin.id AND lpredjoin.id = ratingjoin.id");
However, in both cases my join failes and returns empty
ensembleDf.show();
+-----------+-----------+-----------+-------+
|gprediction|rprediction|lprediction|ratings|
+-----------+-----------+-----------+-------+
+-----------+-----------+-----------+-------+
Any idea why this could be happening? What code changes do I need to do to get this fixed?