-4

I have imported some data. The issue is that the dates are in the format 20140101 for example.

The standard format is 2014-01-01, when I try as.Date(datecolumn, format = '%Y-%m-&d) I get very strange year values.

How can I make my dates to standard date format quickly and efficiently?

I can do it by using paste0, but surely there must be a better way?

Output

> head(backup)
  quote_date   open   high    low  close volume value
1   20151203 263.10 263.10 263.10 263.10      0     0
2   20151202 264.51 264.51 264.51 264.51      0     0
3   20151201 261.91 261.91 261.91 261.91      0     0
4   20151130 260.68 260.68 260.68 260.68      0     0
5   20151127 256.75 256.75 256.75 256.75      0     0
6   20151125 253.93 253.93 253.93 253.93      0     0
> str(backup)
'data.frame':   960 obs. of  7 variables:
 $ quote_date: int  20151203 20151202 20151201 20151130 20151127 20151125 20151124 20151123 20151120 20151119 ...
 $ open      : num  263 265 262 261 257 ...
 $ high      : num  263 265 262 261 257 ...
 $ low       : num  263 265 262 261 257 ...
 $ close     : num  263 265 262 261 257 ...
 $ volume    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ value     : int  0 0 0 0 0 0 0 0 0 0 ...

> head(as.Date(backup$quote_date, format = '%Y%m%d'))
[1] NA NA NA NA NA NA
uncool
  • 2,613
  • 7
  • 26
  • 55
  • 2
    `as.Date(datecolumn, format = '%Y%m%d')` ? – romants Dec 04 '15 at 19:34
  • 2
    The `format` argument is for the format that the data is already in, not what you are trying to make it into. `help(as.Date)` – Rich Scriven Dec 04 '15 at 19:34
  • When I use format = '%Y%m%d' i get NA on all values returned. – uncool Dec 04 '15 at 19:35
  • @uncool Notice that the values are being read in as integers so the first function to apply would be `as.character`, then `as.Date` will apply the correct transformation. There are two different `as.Date` functions, one for numeric values and the other for numeric values. – IRTFM Dec 04 '15 at 20:02
  • ...additionally, something fishy is going on with the code you claim to have run because passing integers to `as.Date()` without specifying an `origin` will return an error, not `NA`, as documented in `?as.Date`. – joran Dec 04 '15 at 20:04
  • @joran I tested with numeric and integer values and get no error, although I will admit that I, too, thought that there would be one. – IRTFM Dec 04 '15 at 20:08
  • @42- Odd. When I look at `as.Date.numeric` the first two lines are just a check on `missing(origin)` and if that's true it throws an error. – joran Dec 04 '15 at 20:10
  • I closed this because have literally hundreds of `Date` questions answered and what needed to be said has been said. Let's give the search function a day in the sun. – Dirk Eddelbuettel Dec 04 '15 at 20:15
  • @DirkEddelbuettel The necessary search strategy to uncover the differences in observed behavior will need the term `zoo`. – IRTFM Dec 04 '15 at 20:27

1 Answers1

3

Like @joran I thought there would be an error with passing integers to as.Date using a format parameter, but it's not so:

> dt <- c(20151203, 20151202, 20151201, 20151130, 20151127, 20151125 ,20151124, 20151123 )
> head(as.Date(dt, format = '%Y%m%d'))
[1] NA NA NA NA NA NA
> 
> str(dt)
 num [1:8] 20151203 20151202 20151201 20151130 20151127 ...
> mode(dt) <- "integer"
> str(dt)
 int [1:8] 20151203 20151202 20151201 20151130 20151127 20151125 20151124 20151123
> head(as.Date(dt, format = '%Y%m%d'))
[1] NA NA NA NA NA NA
> as.Date(dt, format = '%Y%m%d')
[1] NA NA NA NA NA NA NA NA
> as.Date(as.character(dt), format = '%Y%m%d')
[1] "2015-12-03" "2015-12-02" "2015-12-01" "2015-11-30" "2015-11-27" "2015-11-25" "2015-11-24"
[8] "2015-11-23"

Unlike joran, when I look at the code for as.Date.numeric, I see no error to be thrown when origin is missing, only replacement with what most people would consider a sensible default:

if (missing(origin)) 
        origin <- "1970-01-01"
if (identical(origin, "0000-00-00")) 
    origin <- as.Date("0000-01-01", ...) - 1
as.Date(origin, ...) + x

Wait... I now see that I have <environment: namespace:zoo>, and I suspect that this is the source of our differences. That has come up in SO before.

The zoo package masks the base-as.Date.numeric.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • You also need `as.character` from factor variables. – Dirk Eddelbuettel Dec 04 '15 at 20:14
  • Yes, that is true. But neither he (nor we) were giving factors to as.Date. – IRTFM Dec 04 '15 at 20:19
  • Must be a difference in R version...? [Here's](https://github.com/wch/r-source/blob/b156e3a711967f58131e23c1b1dc1ea90e2f0c43/src/library/base/R/dates.R#L63) a link to the source that matches what I seem to have...Aha, I see your edit now. That explains it. – joran Dec 04 '15 at 20:21
  • I think we've see this puzzle before. Will try searching on `[r] zoo as.Date` – IRTFM Dec 04 '15 at 20:23
  • Note that you do get a message when you load zoo that `as.Date` and `as.Date.numeric` are masked. Running the `conflicts()` R function also reports this. The modification is strictly upwardly compatible -- the additional functionality of not requiring an origin would give an error without zoo. – G. Grothendieck Dec 05 '15 at 10:32