I have dataframe, which is having few column which is having date data. I want to apply validation on column and if wrong date comes i want to update that dataframe with error message column. I have tried but not working correctly. My sample dataframe data.
+-------+-----+-----------+-------------+
|AirName|Place|TakeoffDate|arriveoffDate|
+-------+-----+-----------+-------------+
| Delta| Aus| 11/16/18| 08/06/19|
| Delta| Pak| 11/16/18| 08/06/19|
| Vistra| New| 11/16/18| 15/06/19|
| Delta| Aus| 15/16/18| 08/06/19|
| JetAir| Aus| 11/16/18| null|
+-------+-----+-----------+-------------+
I have tried below code.
val DATE_TIME_FORMAT = "MM-dd-yy"
def validateDf(row: Row): Boolean = try {
//assume row.getString(1) with give Datetime string
java.time.LocalDateTime.parse(row.getString(2), java.time.format.DateTimeFormatter.ofPattern(DATE_TIME_FORMAT))
true
} catch {
case ex: java.time.format.DateTimeParseException => {
// Handle exception if you want
false
}
}
val validDf = sample1.filter(validateDf(_))
val inValidDf = sample1.except(validDf)
expected dataframe
+-------+-----+-----------+-------------+-------------+
|AirName|Place|TakeoffDate|arriveoffDate|error message|
+-------+-----+-----------+-------------+-------------+
| Delta| Aus| 11/16/18| 08/06/19| |
| Delta| Pak| 11/16/18| 08/06/19| |
| Vistra| New| 11/16/18| 15/06/19|wrong date |
| Delta| Aus| 15/16/18| 08/06/19|wrong date |
| JetAir| Aus| 11/16/18| null| |
+-------+-----+-----------+-------------+-------------+