1

I have the following code which throws an error as below. Unable to figure out the problem.

val someDF = Seq((8, "bat"),(64, "mouse"),(-27, "horse")).toDF("number", "word")
val someDF2 = Seq((8, "bat"),(64, "cat"),(-45, "tiger")).toDF("number", "word")
val someDF1= someDF.withColumn("Action",lit("A"))
val max_rows= someDF1.count().toInt
for(i <- 0 to max_rows) {
  if (someDF1("number") === someDF2("number")) {
    if (someDF1("word") === someDF2("word")) {
      someDF1("Action") = "D"
    } else {
      someDF1.unionAll(someDF2.select($"*", lit("D")))
    }
  }
}

error: value update is not a member of org.apache.spark.sql.DataFrame

              someDF1("Action") = "D"
                                ^
Andrey Tyukin
  • 43,673
  • 4
  • 57
  • 93
codensleep
  • 11
  • 2
  • 1
    The error message might be a little bit confusing, because `update` is the desugared method resulting from `someDF1("Action") = "D"`, which is equivalent to `someDF1.update("Action", "D")`. As the error message says, there is no `update`, because datasets are immutable. – Andrey Tyukin Feb 05 '19 at 15:39
  • In the duplicate: specifically the fourth bullet point in [this answer](https://stackoverflow.com/a/2662998/2707792). – Andrey Tyukin Feb 05 '19 at 15:44
  • Thanks @AndreyTyukin. Will look into it – codensleep Feb 05 '19 at 16:17
  • Instead of loops, I used a join which worked for me.. val cpdf = Seq((8, "bat"),(64, "mouse"),(-27, "horse")).toDF("number", "word") val tabdf = Seq((8, "bat"),(64, "cat"),(-45, "tiger")).toDF("no", "word") val cpdf1= cpdf.withColumn("Action",lit("A")) val groupedData_col = cpdf1.as("cpdf1").join(tabdf.as("tabdf"), col("cpdf1.word") === col("tabdf.word") && col("cpdf1.number")=== col("tabdf.no"), "left_outer").filter(col("tabdf.word") isNull).select(col("cpdf1.*")) – codensleep Feb 08 '19 at 01:36

0 Answers0