1

I have a messy dataframe with thousand variables and want to automate conversion of specific columns to dates without having to specify which columns explicitely. All columns to convert have "Date" in their name. Most are mdy but they also can be dmy. Some contain errors, or malformatted dates but in a very very minor proportion <0.1%.

I tried:

df %>% select(contains("Date")) %>% as_Date() #Does not work
df %>%  select(contains("Date"))  %>% mdy() #selecting only the columns with dates, does not work
df %>% select(contains("Date")) %>% parse_date_time( c("mdy", "dmy")) #also does not work

I think I dont get something fundamental.

ECII
  • 10,297
  • 18
  • 80
  • 121
  • Have you tried `df[] <- lapply(df, type.convert, as.is = TRUE)` or `as.Date(df[,grep("Date", names(df))])`. Can you post example data? – Skaqqs Oct 23 '21 at 10:30
  • 2
    Or, similarly, `df <- df %>% muate(across(contains("Date"), as.Date))`? I think you need `across`, not `select`... – Limey Oct 23 '21 at 10:35
  • More info on how to deal with different date formats in the same field: https://stackoverflow.com/questions/13764514/how-to-change-multiple-date-formats-in-same-column – Skaqqs Oct 23 '21 at 10:40

2 Answers2

3

Here's a solution based on lubridate:

Toy data:

df <- data.frame(Date1 = c("01-Mar-2015", "31-01-2012", "15/01/1999"), 
                 Var_Date = c("01-02-2018", "01/08/2016", "17-09-2007"), 
                 More_Dates = c("27/11/2009", "22-Jan-2013", "20-Nov-1987"))

# define formats:
formats <- c("%d-%m-%Y", "%d/%m/%Y", "%d-%b-%Y")

A dyplrsolution:

library(dplyr)
library(lubridate)
df %>% 
  mutate(across(contains("Date"), 
                ~ parse_date_time(., orders = formats))) %>%
  mutate(across(contains("Date"),
                ~ format(., "%d-%m-%Y")))
       Date1   Var_Date More_Dates
1 01-03-2015 01-02-2018 27-11-2009
2 31-01-2012 01-08-2016 22-01-2013
3 15-01-1999 17-09-2007 20-11-1987

A base Rsolution:

library(lubridate)
df[,grepl("Date", names(df))] <- apply(df[,grepl("Date", names(df))], 2, 
                  function(x) format(parse_date_time(x, orders = my_formats), "%d-%m-%Y"))
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
0

We could use parse_date from parsedate

library(parsedate)
library(dplyr)
df %>%
    mutate(across(everything(), parse_date))
       Date1   Var_Date More_Dates
1 2015-03-01 2018-01-02 2009-11-27
2 2012-01-31 2016-01-08 2013-01-22
3 1999-01-15 2007-09-17 1987-11-20

data

df <- structure(list(Date1 = c("01-Mar-2015", "31-01-2012", "15/01/1999"
), Var_Date = c("01-02-2018", "01/08/2016", "17-09-2007"), More_Dates = c("27/11/2009", 
"22-Jan-2013", "20-Nov-1987")),
 class = "data.frame", row.names = c(NA, 
-3L))
akrun
  • 874,273
  • 37
  • 540
  • 662