3

Running into an issue when drawing in Excel data from R and converting to a date within R. I have a "time_period" column that is pulled from Excel in that Excel date format with 5 digit numbers (e.g. 41640).

> head(all$time_period)
[1] "41640" "41671" "41699" "41730" "41760" "41791"

These numbers are originally in chr format so I change them to numeric type with:

all[,3] <- lapply(all[,3], function(x) as.numeric(as.character(x)))

Once that is complete, I run the below to format the date:

all$time_period <-format(as.Date(all$time_period, "1899-12-30"), "%Y-%m-%d")

However, once this action is completed the time_period column is all the same date (presumably the first date in the column).

> head(all$time_period)
[1] "2014-01-01" "2014-01-01" "2014-01-01" "2014-01-01" "2014-01-01" "2014-01-01" 

Any suggestions? Thanks in advance.

Franchise
  • 1,081
  • 3
  • 15
  • 30

2 Answers2

4

set the origin argument in as.Date()

These numbers refer to distances away from an origin, which depends on the machine the excel file was created on.

  • Windows: as.Date(my_date, origin = "1899-12-30")
  • Mac: as.Date(my_date, origin = "1904-01-01")

For example:

x <- c("41640","41671","41699","41730","41760","41791")
x <- as.numeric(x)
format(as.Date(x, "1899-12-30"), "%Y-%m-%d")

Returns:

[1] "2014-01-01" "2014-02-01" "2014-03-01" "2014-04-01" "2014-05-01" "2014-06-01"
Rich Pauloo
  • 7,734
  • 4
  • 37
  • 69
JayCe
  • 241
  • 3
  • 9
  • For anyone wondering what `"1899-12-30"` means, it's the `origin` of dates for Microsoft Excel on a **Windows** machine. If using **Mac**, the origin should be `1904-01-01`. [See this blog post for details](https://www.r-bloggers.com/2013/08/date-formats-in-r/). Importantly, the origin refers to the machine the the file was created on, NOT the machine you're reading it from. For example, an excel file created in Windows, then opened in R on a Mac should specify `origin = "1899-12-30"`. – Rich Pauloo Jun 22 '21 at 01:15
2

I believe this one line solves your problem, you don't need to format it, as de default of as.Date function is "%Y-%m-%d".

 time_period = c("41640", "41671", "41699", "41730", "41760", "41791")

 as.Date(as.numeric(time_period), origin =  "1899-12-30").
Giovana Stein
  • 451
  • 3
  • 13