I am using Scala and trying to filter a dataset on a string column with a date and time value. I have been looking at several posts and trying to use SimpleDateFormat
techniques to no avail.
My sample data is:
+----------------------+
|my_date_str |
+----------------------+
|12/11/2018 08:01:55 AM|
|12/11/2018 08:33:22 PM|
|12/13/2018 09:25:28 PM|
|12/17/2018 07:27:36 PM|
+----------------------+
I'd like to keep rows between 7pm and 9pm (date does not matter, only time). I would expect to keep these two rows of the four:
12/17/2018 07:27:36 PM
12/11/2018 08:33:22 PM
I can hack this together using substring
functions but I imagine there is a better way using a to_date
or a unix
function (I tried converting to seconds with unix_timestamp()
then extrapolating the time somehow?), isolating the time, and checking the hour value.
// Filter down to rows between 7 and 9 and PM
my_data.withColumn("hour_str", substring($"my_date_str", 12, 8))
.filter( (substring($"my_date_str", -2, 2) === "PM") && ($"my_date_str" >= "07:00:00") && ($"my_date_str" <= "09:00:00") )
.show(truncate=false)
Too many failed attempts to include but these are a couple posts I used:
How to convert unix timestamp to date in Spark
How to convert String to date time in Scala?
If it's not clear, question is how to effectively filter a string date by an hour range?