I have two dataframes in Scala:
df1 =
ID start_date_time
1 2016-10-12 11:55:23
2 2016-10-12 12:25:00
3 2016-10-12 16:20:00
and
df2 =
PK start_date
1 2016-10-12
2 2016-10-14
I need to add a new column to df1
that will have value 0
if the following condition fails, otherwise -> 1
:
If
ID
==PK
andstart_date_time
refers to the same year, month and day asstart_date
.
The result should be this one:
df1 =
ID start_date_time check
1 2016-10-12-11-55-23 1
2 2016-10-12-12-25-00 0
3 2016-10-12-16-20-00 0
How can I do it?
I assume that the logic should be something like this:
df1 = df.withColumn("check", define(df("ID"),df("start_date")))
val define = udf {(id: String,dateString:String) =>
val formatter = new SimpleDateFormat("yyyy-MM-dd")
val date = formatter.format(dateString)
val checks = df2.filter(df2("PK")===ID).filter(df2("start_date_time")===date)
if(checks.collect().length>0) "1" else "0"
}
However, I have doubts regarding how to compare dates, because df1
and df2
have differently formatted dates. How to better implement it?