0

I've a data frame with date field as factor and have mix of values as below. How can I standardize, convert into date format and extract month and year?

1  Oct 24 2013  3:59PM     
2  Nov  5 2013  3:00PM     
3  Nov 26 2013  1:00PM      
4  2015-05-05 21:09:00      
5  Nov 19 2013  1:00PM      
6  2015-05-28 20:23:00      
7  2015-05-28 20:24:00      
8  Nov 12 2013  1:00PM      
9  2015-05-28 20:29:00      
10 2015-05-28 20:26:00      
Rav
  • 1
  • what you have tried so far? – Enamul Hassan Oct 10 '15 at 00:16
  • Nothing useful [here](http://stackoverflow.com/questions/28618004/convert-dates-with-different-formats-in-data-frame-over-multiple-columns), [here](http://stackoverflow.com/questions/13764514/how-to-change-multiple-date-formats-in-same-column) or [here](http://stackoverflow.com/questions/4067761/format-a-date-column-in-a-data-frame)? –  Oct 10 '15 at 00:19

2 Answers2

1

You could try parse_date_time() from the lubridate package. I find it makes dealing with multiple formats a lot easier. It's just a matter of fiddling around with the orders argument. Here we can use c("mdyR", "ymdT") for our orders vector.

library(lubridate)
parse_date_time(df$V1, c("mdyR", "ymdT"))
# [1] "2013-10-24 15:59:00 UTC" "2013-11-05 15:00:00 UTC"
# [3] "2013-11-26 13:00:00 UTC" "2015-05-05 21:09:00 UTC"
# [5] "2013-11-19 13:00:00 UTC" "2015-05-28 20:23:00 UTC"
# [7] "2015-05-28 20:24:00 UTC" "2013-11-12 13:00:00 UTC"
# [9] "2015-05-28 20:29:00 UTC" "2015-05-28 20:26:00 UTC"

To extract the month and year, we, can do the following.

pdt <- parse_date_time(df$V1, c("mdyR", "ymdT"))
month(pdt)
# [1] 10 11 11  5 11  5  5 11  5  5
year(pdt)
# [1] 2013 2013 2013 2015 2013 2015 2015 2013 2015 2015

Data:

df <- structure(list(V1 = structure(c(10L, 9L, 8L, 1L, 7L, 2L, 3L, 
6L, 5L, 4L), .Label = c("2015-05-05 21:09:00", "2015-05-28 20:23:00", 
"2015-05-28 20:24:00", "2015-05-28 20:26:00", "2015-05-28 20:29:00", 
"Nov 12 2013  1:00PM", "Nov 19 2013  1:00PM", "Nov 26 2013  1:00PM", 
"Nov  5 2013  3:00PM", "Oct 24 2013  3:59PM"), class = "factor")), .Names = "V1", class = "data.frame", row.names = c(NA, 
-10L))
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
1

See if data can be parsed with one format (the default for as.POSIXct.factor in this case) and then try the other if unsuccessful:

 dats$dt2 <- as.POSIXct( # Needed b/c get numeric values from the `if(){}`; Why?
              sapply(trim(dats$dt),   # sjmisc:trim() only needed if have extra spaces
                function(d) if( !is.na( strptime(d, "%Y-%m-%d %H:%M:%S") ) ){
                       as.POSIXct(d)      } else {
                       as.POSIXct( d, format="%b %d %Y  %H:%M%p") }), origin="1970-01-01" )
IRTFM
  • 258,963
  • 21
  • 364
  • 487