0

I have :

import spark.implicits._
import org.apache.spark.sql.functions._

val someDF = Seq(
  (8, "K25", "2019-05-22"),
  (64, "K25", "2019-05-26"),
  (64, "K25", "2019-03-26"),
  (27, "K26", "2019-02-24")
).toDF("Number", "ID", "Date").withColumn("Date", to_date(col("Date")))

My aim is to filter this dataframe based on date range, so let's assume I want to get dataframe rows whose date is in 2019-05-26 minus 3 months. How should I cope with this please ?

scalacode
  • 1,096
  • 1
  • 16
  • 38

1 Answers1

1

You can use filter as

val someDF = Seq(
  (8, "K25", "2019-05-22"),
  (64, "K25", "2019-05-26"),
  (64, "K25", "2019-03-26"),
  (27, "K26", "2019-02-24")
).toDF("Number", "ID", "Date").withColumn("Date", to_date(col("Date")))

val compareDate = to_date(lit("2019-05-26"), "yyyy-MM-dd")

someDF.filter(
  $"Date" < to_date(lit("2019-05-26"), "yyyy-MM-dd") &&
    $"Date" > add_months(compareDate, -3)
)

You can simply use date string if you know both dates and in correct date format.

Output:

+------+---+----------+
|Number|ID |Date      |
+------+---+----------+
|8     |K25|2019-05-22|
|64    |K25|2019-03-26|
+------+---+----------+
koiralo
  • 22,594
  • 6
  • 51
  • 72