0

I have a date file which has date in following format:

  • 2017-01-04 00:00:00 (Y-d-m)
  • 2017-03-22 00:00:00 (Y-m-d)
  • 06/04/2017 (d/m-Y)

When I import this into R it takes it as a character. I want to convert it into Date format,but when I convert it into date format it takes some NA values

I have tried this:

df_TEST$UTC_date <- as.Date(ifelse(grepl("-", df_TEST$UTC_date),
                            as.Date(df_TEST$UTC_date, format = c("%Y-%m-%d")),
                            as.Date(df_TEST$UTC_date, format = c("%d/%m/%Y"))), origin = "1970-01-02")

but it does 't make a distinguish between (Y-d-m) and (Y-m-d). How can I convert these dates into one date format in R?

  • 1
    Is the information `(Y-d-m)`, `(Y-m-d)` and `(d/m-Y)` included in the data? – GKi Oct 25 '22 at 11:15
  • No it's only for the seek of explanation – Ayman Almousa Oct 25 '22 at 11:17
  • Your dates are ambiguous, so the dupe links I'm providing (one for POSIXct, one for Dates that does the same basic steps) will work but you'll need to self-determine which of "Y-d-m" and "Y-m-d" you would accept for your first string (since it could realistically be either). Choose the order of the "candidates" (in the posted answers) carefully, as the first found will be retained. – r2evans Oct 25 '22 at 11:17
  • 2017-01-04 00:00:00 2017-03-22 00:00:00 06/04/2017 – Ayman Almousa Oct 25 '22 at 11:17
  • 1
    Is there any other information which will allow to distinguish e.g. for `2017-01-04` if it is `(Y-d-m)` or `(Y-m-d)`? – GKi Oct 25 '22 at 11:18
  • AymanAlmousa, please check the answers in the dupe-links and confirm that the use of "candidate formats" will suffice for you. If not, please @ping me and mention what doesn't work, and I can reopen the question. – r2evans Oct 25 '22 at 11:20
  • @r2evans i tried already the candidate formats. it does not recognize the defference between Y-m-d and Y-d-m. – Ayman Almousa Oct 25 '22 at 11:42
  • 1
    ***Nothing*** (not even a human) can "know" the difference between those two (for your first date) unless there is other context (as GKi requested). That aside, if you acknowledge that you think it is more likely to be Ymd before Ydm, then `formats <- c("%Y-%m-%d", "%Y-%d-%m", "%m/%d/%Y")` should work. If your issue is that you want the code to automatically know which of those is "right" for `2017-01-04`, then the answer is "no code can know that" (with no further context). – r2evans Oct 25 '22 at 11:47
  • The other context may be: 2017-01-04 should be 01-04-2017 because 04 is between 01 and 12 2017-01-22 should be 22-01-2017 because 22 is bigger than 12 – Ayman Almousa Oct 25 '22 at 12:11
  • 1
    I'm not sure I understand: both `01` and `04` are between `01` and `12`, so they are both good candidates. I'm not contesting `2017-01-22` or `2017-22-01`, they are unambiguous (and the candidates-method will work correctly). What I'm saying is that `01` and `04` can plausibly be either month or day, and no trickery in the world will know with certainty which is which. This is why you either (a) must have other context that makes the choice unambiguous, or (b) should choose a most-likely order of candidate formats to reduce (but not remove) your chance of error. – r2evans Oct 25 '22 at 12:26

0 Answers0