2

I have a column within a dataframe that contains date information that is stored in multiple forms e.g.

dob <- c("18/02/1983","36782","01/11,1999 11:55",
         "Dead 12/12/2001","Alive 03/07/1985")

I'd like to convert all of these into POSIXlt dates of the format "%d/%m/%Y", with no time info, and no "Dead" or "Alive" prefixing them.

All previous solutions to similar problems involve only two possible formats, that I can tell, whereas I have more, and also I have the problem of converting the 5 digit number format dates (which I think are numeric dates with origin 1970-01-01 from Excel) also. I'm using R-3.1.2. Any help would be gratefully received!

Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
user4575913
  • 507
  • 1
  • 7
  • 16

2 Answers2

5

lubridate is flexible as far as date formatting is concerned.

Depending on how representative your sample of 5 dates is of your larger dataset, you could parse them once to filter out those that are generally day-month-year. Then, assuming the remainder are the dodgy Excel dates, parse them separately according to their proper origin (word of warning, this might depend on which OS the Excel file was created on).

library(lubridate)
d <- parse_date_time(dob, c('%d%m%y', '%d%m%y %H%M'))
d[is.na(d)] <- as.POSIXct(as.Date(as.numeric(dob[is.na(d)]), 
                                  origin = "1899-12-30"))

d

## [1] "1983-02-18 00:00:00 UTC" "2000-09-13 00:00:00 UTC" "1999-11-01 11:55:00 UTC"
## [4] "2001-12-12 00:00:00 UTC" "1985-07-03 00:00:00 UTC"

AFAIK you can't omit time info from POSIXlt/POSIXct objects, so if you want to do that you might need to:

as.Date(d)

which will be in %d-%m-%Yformat (in my locale),

or

format(d, '%d/%m/%Y')

which will be character.

Community
  • 1
  • 1
jbaums
  • 27,115
  • 5
  • 79
  • 119
1

There's no easy or simple solution for this, you'll have to at least go through the formats by hand. But saying that, the pseudocode for solution would be something like this:

tmp <- grepl('[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]', dob))
formats[tmp] <- '%d/%m/%Y'
tmp <- grepl( other format, dob)
formats[tmp] <- 'other format'
etc...
dates <- strptime(dob, formats)
LauriK
  • 1,899
  • 15
  • 20