0

i have a csv data that I am in the process of cleaning,

and it contains date columns with xxx_date as part of the column name, however, the entries (straight from CSV) has mm/dd/yyyy or mm/dd/yy format. So in case of yy format, there can either 2/2/01 or 2/2/97, which would each indicate year 2001 or 1997.

I was hoping someone can help me write a function to automatically recognize the word "date" contained in variable column name and change these entries into date format (despite the heterogeneity)

i've tried this:

date_cols <- grep('date$', names(xdb6)) # gets all column indices with word date contained in the title
xdb6[date_cols] <- lapply(xdb6[date_cols], (as.Date)) #stuck here

but doesn't work,

anyway that i can do this?

Thanks

DL

r2evans
  • 141,215
  • 6
  • 77
  • 149
David Lee
  • 21
  • 2
  • 1
    Please read the information at the top of the [tag:r] tag page and provide input using `dput` and expected output as well as any code attempt. – G. Grothendieck May 05 '21 at 18:35
  • Ambiguities in date formats are why packages like `lubridate`, `anytime`, etc have been written. Some of them support multiple "candidate" formats, hoping to find the one that seems the most likely given the data. There are also numerous questions and answers on SO that already address this. One that should work well for you is https://stackoverflow.com/a/52319606/3358272, where my answer cycles through various possible formats. Good luck! – r2evans May 05 '21 at 18:35
  • BTW, your logic of using `xdb6[.] <- lapply(xdb6[.], ...)` is perfect, stick with that. Now take the code from my other answer, formalize it by turning it into a simple function, and put it in your `lapply`. – r2evans May 05 '21 at 18:36
  • Dear evans, thank you for your prompt response! can you please elaborate on what you mean by formalizing your function into the lapply? thanks – David Lee May 05 '21 at 20:36

0 Answers0