1

I have a dataframe in R, which has two variables that are dates and I need to calculate the difference in days between them. However, they are formatted as YYYYMMDD. How do I change it to a date format readable in R?

amonk
  • 1,769
  • 2
  • 18
  • 27
PaulaF
  • 393
  • 3
  • 17
  • `as.Date(df, format="%m/%d/%Y")` – M-- Jun 21 '17 at 20:40
  • 1
    Please use `dput` to provide us with a sample of your data. – G5W Jun 21 '17 at 20:40
  • 3
    As @G5W said, Please read [How to make a great reproducible example in R?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – M-- Jun 21 '17 at 20:41
  • 1
    Searching SO for `[r] convert date format`, I found a couple of likely candidates: https://stackoverflow.com/questions/30915555/convert-date-format-to-ccyymmdd-hhmmss and https://stackoverflow.com/questions/21645892/convert-date-format-to-appropriate-one – r2evans Jun 21 '17 at 20:43
  • 1
    Also consider using package [`anytime`](https://cran.r-project.org/web/packages/anytime/anytime.pdf). – M-- Jun 21 '17 at 20:45

2 Answers2

7

This should work

lubridate::ymd(given_date_format)
Raj Padmanabhan
  • 540
  • 5
  • 11
1

I like anydate() from the anytime package. Quick demo, with actual data:

R> set.seed(123)    # be reproducible
R> data <- data.frame(inp=Sys.Date() + cumsum(runif(10)*10))  
R> data$ymd <- format(data$inp, "%Y%m%d")     ## as yyyymmdd
R> data$int <- as.integer(data$ymd)           ## same as integer
R> library(anytime)
R> data$diff1 <- c(NA, diff(anydate(data$ymd)))   # reads YMD
R> data$diff2 <- c(NA, diff(anydate(data$int)))   # also reads int 
R> data
          inp      ymd      int diff1 diff2
1  2017-06-23 20170623 20170623    NA    NA
2  2017-07-01 20170701 20170701     8     8
3  2017-07-05 20170705 20170705     4     4
4  2017-07-14 20170714 20170714     9     9
5  2017-07-24 20170724 20170724    10    10
6  2017-07-24 20170724 20170724     0     0
7  2017-07-29 20170729 20170729     5     5
8  2017-08-07 20170807 20170807     9     9
9  2017-08-13 20170813 20170813     6     6
10 2017-08-17 20170817 20170817     4     4
R> 

Here the first column is actual dates we work from. Columns two and three are then generates to match OP's requirement: YMD, either in character or integer.

We then compute differences on them, account for the first 'lost' data point differences when we have no predecessor and show that either date format works.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725