0

Using R

Got large clinical health data set to play with, but dates are weird

Most problematic is 2digityear/halfyear, as in 98/2, meaning at some point in 1998 after July 1

I have split the column up into 2 character columns, e.g. 98 and 2 but now need to convert the 2 digit year character string into an actual year.

I tried as.Date(data$variable,format="%Y") but not only did I get a conversion to 0098 as the year rather than 1998, I also got todays month and year arbitrarily added (the actual data has no month or day).

as in 0098-06-11

How do I get just 1998 instead?

neilfws
  • 32,751
  • 5
  • 50
  • 63
LeanneB
  • 69
  • 8

1 Answers1

1

Not elegant. But using combination of lubridate and as.Date you can get that.

library(lubridate)
data <- data.frame(variable = c(95, 96, 97,98,99), date=c(1,2,3,4,5))
data$variableUpdated <- year(as.Date(as.character(data$variable), format="%y"))

and only with base R

data$variableUpdated <- format(as.Date(as.character(data$variable), format="%y"),"%Y")
Theo
  • 575
  • 3
  • 8
  • I apologize I have only been using R for a couple of weeks. Can you explain how to assign that to the variable? Ive been doing this by code like: data$variableUpdated <- as.Date(data$variable,format="%Y"). Do I slip the years list in the middle? data$variableUpdated <- format(as.Date(data$variable,, years, format ("%y")),"%Y")) – LeanneB Jun 11 '19 at 05:55
  • I tried the code I wrote above and it made mostly NAs and replaced a few dates with today's date (which doesn't exist in the data set). What did I do wrong? – LeanneB Jun 11 '19 at 06:03
  • yes, you are right. Add as.character to convert it to string. `data$variableUpdated <- format(as.Date(as.character(data$variable), format ("%y")),"%Y")` – Theo Jun 11 '19 at 06:11