0

What would be a solution to detect date-time format

14/07/2009 19:15:29

Is this a fullproof solution?

str_detect(s,regex("([0-9]{2}/[0-9]{2}/[0-9]{4}) [0-9]{2}:[0-9]{2}:[0-9]{2}"))

For example for the format

14.07.2009

I have written the regex to be

str_detect(date,regex("([0-9]{2}\\.[0-9]{2}\\.[0-9]{4})"))

I don't have much idea regarding regex in R or regex in general, just the very basic stuff so would appreciate an easy approach with detailed logic. Thanks in advance.

Nil_07
  • 106
  • 1
  • 8

1 Answers1

1

As a beginner, I sometimes found it helpful to assemble the pattern as follows:

c(
  "[0-9]{2}", # day
  "/",
  "[0-9]{2}", # month
  "/",
  "[0-9]{4}", # year
  " ",
  "[0-9]{2}", # Hour
  ":",
  "[0-9]{2}", # minute
  ":",
  "[0-9]{2}"  # second
) |> paste(collapse = "")

Returns the pattern:

[1] "[0-9]{2}/[0-9]{2}/[0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2}"
stringr::str_detect("14/07/2009 19:15:29",
                    "[0-9]{2}/[0-9]{2}/[0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2}")
# [1] TRUE

Update (as per comments)

Here is how you could use the lubridate package. dmy_hms() finds datetimes in your format:

lubridate::dmy_hms("14/07/2009 19:15:29")

# [1] "2009-07-14 19:15:29 UTC"

But it will not parse invalid dates:

lubridate::dmy_hms("14/07/2009 19:15:70") # invalid seconds

# [1] NA

So to validate you could do:

(! is.na(lubridate::dmy_hms("14/07/2009 19:15:29")))

# [1] TRUE
ktiu
  • 2,606
  • 6
  • 20
  • What about dates which don't exist like 99/99/9999 99:99:99? Won't it return TRUE for all these regexes as well? – Nil_07 Jun 22 '21 at 16:17
  • 1
    Yes, it would! In order to validate dates I would suggest using an existing package like `lubridate` – ktiu Jun 22 '21 at 16:20
  • Hi, I went through lubridate package. Can you tell me how to use it in this case? – Nil_07 Jun 23 '21 at 08:01
  • If this date type is already in a POSIXct format then it will be already loaded into R in this format **2009-07-14 19:15:29 UTC**. What if the issue is in the Excel file, the date looks like **14-07-2009 19:15:29** and the type is POSIXct? Then using your code this will also return TRUE but the purpose won't be satisfied – Nil_07 Jun 23 '21 at 08:24
  • It would be helpful if you could provide us with a reproducible [minimal working example](https://en.wikipedia.org/wiki/Minimal_working_example) that we can copy and paste to better understand the issue and test possible solutions. You can share datasets with `dput(YOUR_DATASET)` or smaller samples with `dput(head(YOUR_DATASET))`. (See [this answer](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example#5963610) for some great advice.) – ktiu Jun 23 '21 at 08:28
  • [This](https://stackoverflow.com/questions/68096002/r-are-there-multiple-posixct-date-formats-and-if-so-how-to-convert-one-posixct?noredirect=1#68096002) will give you a better idea. This question talks about changing the date format. All I am asking you is how to validate in such a scenario? The question could also be looked upon as how to validate whether the date format is in the desired format in the Excel file. – Nil_07 Jun 23 '21 at 08:38
  • This is nonsensical: If it's already POSIXct then it's already valid. Without example data and a clear description of what you want to achieve I cannot help you. – ktiu Jun 23 '21 at 08:50