2

I can receive files where date can be in one of the two formats: "mm/dd/yyyy" or "yyyy-mm-dd". They are factors to start with. Irrespective of the date format i receive it in, i want to be able to convert them to 'Date' data type of format "yyyy-mm-dd". I have tried using

df_1$Date <- as.Date(as.character(df_1$Date), format = "%Y-%m-%d")

This works with the format "yyyy-mm-dd" but gives NA when input values are of format "mm/dd/yyyy". Similarly, other methods i have tried works for only one of these 2 formats. I need it to work for both.

Below i have posted code for creating the datasets and replicating the issue.

df_1 <- structure(list(Text.Identifier = c(4L, 5L, 7L, 1838L), Date = structure(c(2L, 
                                                                      2L, 1L, 3L), .Label = c("5/18/2016", "7/12/2015", "8/29/2016"
                                                                      ), class = "factor")), .Names = c("Text.Identifier", "Date"), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                        -4L))

df_2 <- structure(list(Text.Identifier = 1:4, Date = structure(c(5L, 
                                                     5L, 5L, 1L), .Label = c("2015-07-12", "2016-05-01", "2016-05-05", 
                                                                             "2016-05-09", "2016-05-12", "2016-05-18", "2016-08-01", "2016-08-19", 
                                                                             "2016-08-29", "2016-09-20"), class = "factor")), .Names = c("Text.Identifier", 
                                                                                                                                         "Date"), row.names = c(NA, 4L), class = "data.frame")

df_1$Date <- as.Date(df_1$Date, format = "%Y-%m-%d")
df_2$Date <- as.Date(df_2$Date, format = "%Y-%m-%d")
View(df_1)
View(df_2)
  • 2
    Check out `parse_date_time` in the `lubridate` package. Also it worth re-orienting your thinking from excel where exact order mattered into a more R way of thinking where it is more important what class you are dealing with. – boshek Dec 03 '16 at 00:08
  • Potential duplicate from 7 hours ago http://stackoverflow.com/a/40940164/5472462 that answers this question very well. – vincentmajor Dec 03 '16 at 04:02
  • Possible duplicate of [Function to automatically apply correct as.Date format in R](http://stackoverflow.com/questions/40940016/function-to-automatically-apply-correct-as-date-format-in-r) – vincentmajor Dec 03 '16 at 04:02
  • @vincentmajor That doesn't look like a dupe to me. I don't think that `lubridate` can handle dates with mixed MDY / YMD order. – RHertel Dec 03 '16 at 12:35
  • @RHertel I believe it can. `library(lubridate); data = c("01-12-2000", "02-11-2001", "2016-06-20", "2016-12-05"); parse_date_time(data, c('dmy', 'ymd'))` – vincentmajor Dec 05 '16 at 14:29
  • @vincentmajor Agreed. One will need to wrap the output into `as.Date()`, but yes, if the possible formats are specified, then it works. In my opinion, the need for such hard-coding of the formats means that the `lubridate` library cannot really handle the different dates, in the sense that it needs assistance from the user. However, the `anytime` library can recognize virtually any date without any further input. It is more flexible and concise than the linked solution. I therefore think that it is better to refer to the `anytime` package than to the linked answer in this case. – RHertel Dec 05 '16 at 17:24
  • @vincentmajor Thanks a lot. That worked. – Suraj Tata Prasad Dec 07 '16 at 20:35
  • @SurajTataPrasad You are welcome. I added an answer from my comment above. Go ahead and mark this question answered if it was helpful. Thanks. – vincentmajor Dec 07 '16 at 22:04

3 Answers3

3

I suggest using the anydate() function from the anytime library. It is better suited to this case than lubridate's parse_date_time(), since it recognizes the dates without requiring any user input concerning the order of the entries, like ymd or dmy. Furthermore, there is no problem if the data is stored as factors.

Here's an example:

my_dates <- c("2015-07-12", "2016-05-01", "2016-05-05", "2016-05-09", 
 "2016-05-12", "2016-05-18", "2016-08-01", "2016-08-19", "2016-08-29",
 "2016-09-20", "5/18/2016", "7/12/2015", "8/29/2016")
my_dates <- as.factor(my_dates)
library(anytime)
anydate(my_dates)
# [1] "2015-07-12" "2016-05-01" "2016-05-05" "2016-05-09" "2016-05-12" "2016-05-18"
# [7] "2016-08-01" "2016-08-19" "2016-08-29" "2016-09-20" "2016-05-18" "2015-07-12"
#[13] "2016-08-29

One can verify that the class of the output is indeed Date

class(anydate(my_dates))
#[1] "Date"

whereas another posted solution yields a POSIX object with unnecessary time zone information.

RHertel
  • 23,412
  • 5
  • 38
  • 64
1

If you're sure the only two date formats are "mm/dd/yyyy" or "yyyy-mm-dd", this should work. This solution uses grep to find a forward slash in the dates, and converts those dates from "mm/dd/yyyy" to "yyyy-mm-dd".

Here is an example:

dates <- data.frame(date = c("2015-11-01", "12/12/2016", "1992-05-28", "03/05/2011"), stringsAsFactors = FALSE)

dates$date[grep("/", dates$date)] <- as.character(as.Date(dates$date[grep("/", dates$date)], "%m/%d/%Y"))
hoggue
  • 147
  • 7
  • You are using `stringsAsFactors = FALSE`, but the OP stated that the dates are stored as factors. – RHertel Dec 03 '16 at 13:58
1

Adding my comment as an answer so we can mark this question answered.

The lubridate package is smart enough to choose between different date separators and also different date orders.

library(lubridate)
data = c("01-12-2000", "02-11-2001", "2016-06-20", "2016-12-05")
parse_date_time(data, c('dmy', 'ymd'))

## returns
## [1] "2000-12-01 UTC" "2001-11-02 UTC" "2016-06-20 UTC" "2016-12-05 UTC"
vincentmajor
  • 1,076
  • 12
  • 20