-1

I am parsing a date by the following statements and the spark program is throwing error while execution.

import java.time.LocalDate
val filterDate = "2017-06-26"
val filter = LocalDate.parse(load_dt)

I used filter variable in filtering records from a data frame column.

val df1 = spark.sql("select * from tableA")
val df2 = df1.filter(load_dt = filter)

while executing the above program it threw following error

User class threw exception: org.apache.spark.sql.AnalysisException: cannot resolve '(tableA.`load_dt` = ((2017 - 6) - 26))' due to data type mismatch: differing types in '(tableA.`load_dt` = ((2017 - 6) - 26))' (date and int).; line 1 pos 0;

I couldn't understand why it is looking at (2017 - 6) separately and (-26) separately.

could anyone please help Thanks

RSG
  • 67
  • 5
  • 18

1 Answers1

0

Spark's DataFrame API doesn't have encoders for java.time.LocalDate, but since your load_dt column is already of type java.sql.Date, you don't need to convert your filterDate to LocalDate. Another issue is that === should be used for equality check in your filter condition, like in the following example:

val filterDate = "2017-06-26"

val df1 = Seq(
  (1, java.sql.Date.valueOf("2017-06-25")),
  (2, java.sql.Date.valueOf("2017-06-26")),
  (3, java.sql.Date.valueOf("2017-06-27"))
).toDF("id", "load_dt")

val df2 = df1.filter($"load_dt" === filterDate)

df2.show
+---+----------+
| id|   load_dt|
+---+----------+
|  2|2017-06-26|
+---+----------+

[UPDATE]

Per your comment, if filterColumn and filterValue are provided as variables:

Using Spark SQL:

df1.createOrReplaceTempView("df1Table")

val tableName = "df1Table"
val filterColumn = "load_dt"
val filterValue = "2017-06-26"

val df2 = spark.sqlContext.sql(
  "select * from " + tableName + " where " + filterColumn + " = '" + filterValue + "'"
)

// Alternative: use string interpolation `s"..."`
val df2 = spark.sqlContext.sql(
  s"select * from $tableName where $filterColumn = '$filterValue'"
)

Using Spark DataFrame API:

val df2 = df1.select("*").where(col(filterColumn) === filterValue)
Leo C
  • 22,006
  • 3
  • 26
  • 39
  • Thanks Leo for your answer. I was able to execute the program with '===' and passing filterDate = "2017-06-26" inside the program. I have an issue now that when I am trying to pass the filterValue as argument from spark-submit it is not reading it correctly as I cannot give the date under quotes " ". like val df = spark.sql("select * from" + tableName + "where" + filterColumn + " = " + " ' " + filterValue + " ' "). I already defined table name and filterColumn before this statement in program. – RSG Jan 12 '18 at 15:36
  • It looks like you're missing/mis-placing spaces in composing your SQL statement. Please see my updated answer. – Leo C Jan 12 '18 at 16:44