18

How to convert between year,month,day and dates in R?

I know one can do this via strings, but I would prefer to avoid converting to strings, partly because maybe there is a performance hit?, and partly because I worry about regionalization issues, where some of the world uses "year-month-day" and some uses "year-day-month".

It looks like ISODate provides the direction year,month,day -> DateTime , although it does first converts the number to a string, so if there is a way that doesn't go via a string then I prefer.

I couldn't find anything that goes the other way, from datetimes to numerical values? I would prefer not needing to use strsplit or things like that.

Edit: just to be clear, what I have is, a data frame which looks like:

year month day hour somevalue
2004 1     1   1   1515353
2004 1     1   2   3513535
....

I want to be able to freely convert to this format:

time(hour units) somevalue
1             1515353
2             3513535
....

... and also be able to go back again.

Edit: to clear up some confusion on what 'time' (hour units) means, ultimately what I did was, and using information from How to find the difference between two dates in hours in R?:

forwards direction:

lh$time <- as.numeric( difftime(ISOdate(lh$year,lh$month,lh$day,lh$hour), ISOdate(2004,1,1,0), units="hours"))
lh$year <- NULL; lh$month <- NULL; lh$day <- NULL; lh$hour <- NULL

backwards direction:

... well, I didnt do backwards yet, but I imagine something like:

  • create difftime object out of lh$time (somehow...)
  • add ISOdate(2004,1,1,0) to difftime object
  • use one of the solution below to get the year,month,day, hour back

I suppose in the future, I could ask the exact problem I'm trying to solve, but I was trying to factorize my specific problem into generic reusable questions, but maybe that was a mistake?

Community
  • 1
  • 1
Hugh Perkins
  • 7,975
  • 7
  • 63
  • 71
  • 2
    There a lots of questions already answered here, and there are R Journal / R News articles. – Dirk Eddelbuettel Oct 19 '12 at 14:42
  • 1
    There are many almost identical questions on this topic. Here is one: http://stackoverflow.com/q/12863841/602276 – Andrie Oct 19 '12 at 14:42
  • What is the question? Are you so dead against internal conversion to strings that you will only accept Answers that never do that conversion or are you simply interested in the title of your question? If so `ISOdate()` would seem perfectly acceptable. – Gavin Simpson Oct 19 '12 at 15:13
  • Might be clearer if the title were rewritten to go the othe way: "How to convert between dates and year, month, day?" (the "R" is unnecessary, that information is carried in the tags ...) – Ben Bolker Oct 19 '12 at 15:16
  • Can you clarify your example (after some sleep, perhaps), since it looks like `time_in_hours` could just be taken from the `hour` column? – Joshua Ulrich Oct 19 '12 at 15:57
  • And, also, where do you want to store the information you've thrown away in the second example data frame to recover the information shown in the first? – Gavin Simpson Oct 19 '12 at 16:02
  • Also, please don't significantly alter Questions like this. Both Answers now do not answer this new Question which just means all the work that went into them is wasted. You could have started a new question for the last edit you just made. *sigh* do I have to delete my Answer now!!? – Gavin Simpson Oct 19 '12 at 16:03

3 Answers3

24

Because there are so many ways in which a date can be passed in from files, databases etc and for the reason you mention of just being written in different orders or with different separators, representing the inputted date as a character string is a convenient and useful solution. R doesn't hold the actual dates as strings and you don't need to process them as strings to work with them.

Internally R is using the operating system to do these things in a standard way. You don't need to manipulate strings at all - just perhaps convert some things from character to their numerical equivalent. For example, it is quite easy to wrap up both operations (forwards and backwards) in simple functions you can deploy.

toDate <- function(year, month, day) {
    ISOdate(year, month, day)
}

toNumerics <- function(Date) {
    stopifnot(inherits(Date, c("Date", "POSIXt")))
    day <- as.numeric(strftime(Date, format = "%d"))
    month <- as.numeric(strftime(Date, format = "%m"))
    year <- as.numeric(strftime(Date, format = "%Y"))
    list(year = year, month = month, day = day)
}

I forego the a single call to strptime() and subsequent splitting on a separation character because you don't like that kind of manipulation.

> toDate(2004, 12, 21)
[1] "2004-12-21 12:00:00 GMT"
> toNumerics(toDate(2004, 12, 21))
$year
[1] 2004

$month
[1] 12

$day
[1] 21

Internally R's datetime code works well and is well tested and robust if a bit complex in places because of timezone issues etc. I find the idiom used in toNumerics() more intuitive than having a date time as a list and remembering which elements are 0-based. Building on the functionality provided would seem easier than trying to avoid string conversions etc.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • Fair enough. I guess I'm just a little nervous that there could be localization issues sometime? I've been burned so many times where the regional settings affect the month-day order. (Edit: Well, and I guess, strings sounds like not the highest-performance way to handle numbers?) – Hugh Perkins Oct 19 '12 at 15:50
  • Note that nowhere do I create a string in the form of "XX-XX-YYYY" where there may be ambiguity in which of the "XX" refers to day or month. I use well tested functions to extract the *specific* parts of the internal representation of the date/time and render it appropriate. When I ask for the day part I always get the day part etc. – Gavin Simpson Oct 19 '12 at 15:53
  • The localisation only comes in in two places. i) when taking a character representation of the date/time, but that can always be handled by passing both the string **and** the format for that string. ii) the POSIXt representations have timezones and they can be a bit confusing. But this is an issue for how R internally handles DateTimes, and not something related to character representations. – Gavin Simpson Oct 19 '12 at 15:55
  • @HughPerkins: regional settings don't affect day-month order, even if they affected how the objects were _printed_, they wouldn't affect how they're actually stored. You do have to be careful about timezones though, but that's true everywhere. – Joshua Ulrich Oct 19 '12 at 15:56
  • 2
    Or use `lubridate` which provides `hour`, `year`, `hour` etc functions for you. – hadley Oct 19 '12 at 18:11
  • @hadley +1 yes, I was going to look into / suggest that now that I am home. Been putting babies to bed... – Gavin Simpson Oct 19 '12 at 18:16
  • I'm selecting this answer, because it does answer the question, and it does avoid having to deal with posix conventions for year,day,month. I'd quite like to see a solution using lubridate. As for my own usage, in the end, I went with ISOdate for forwards conversion, and then simply didn't throw away the information, copied it to a new table, so I could use a table lookup to do the backwards conversion. – Hugh Perkins Oct 23 '12 at 03:53
16

I'm a bit late to the party, but one other way to convert from integers to date is the lubridate::make_date function. See the example below from R for Data Science:

library(lubridate)
library(nycflights13)
library(tidyverse)

a <- flights %>%
  mutate(date = make_date(year, month, day))
Preston
  • 7,399
  • 8
  • 54
  • 84
4

Found one solution for going from date to year,month,day.

Let's say we have a date object, that we'll create here using ISOdate:

somedate <- ISOdate(2004,12,21)

Then, we can get the numerical components of this as follows:

unclass(as.POSIXlt(somedate))

Gives:

$sec
[1] 0

$min
[1] 0

$hour
[1] 12

$mday
[1] 21

$mon
[1] 11

$year
[1] 104

Then one can get what one wants for example:

unclass(as.POSIXlt(somedate))$mon

Note that $year is [actual year] - 1900, month is 0-based, mday is 1-based (as per the POSIX standard)

Hugh Perkins
  • 7,975
  • 7
  • 63
  • 71
  • 3
    Your note doesn't deserve a ":-O". Those are the POSIX standards. If you don't like it, use `format` instead: `format(ISOdate(2004,12,21),"%m")`. `ISOdate` **does not** return a string, as `?ISOdate` says, it is a wrapper to `strptime` and returns a `POSIXct` class object. – Joshua Ulrich Oct 19 '12 at 14:54
  • We shouldn't have to convert to strings at all. Have a look at java's joda time for a really easy to use class. Modding me down for asking about one of R's achilles heels doesn't make me feel any better about the fact that I've spent 90 minutes trying to figure this out so far.... – Hugh Perkins Oct 19 '12 at 14:58
  • Please don't confuse being downvoted for "modding" - You'll know when a Mod does something to your Answers/Questions as they are identified by a blue diamond. Voting is part and parcel of [so] and related sites. Get used to it and don't take it too seriously. – Gavin Simpson Oct 19 '12 at 15:06
  • That doesnt help the fact that no-one is actually answering my questions on r dates and times, but just saying it's "obvious". If it was obvious, I wouldn't have asked the questions... – Hugh Perkins Oct 19 '12 at 15:08
  • You don't have to convert to a string. Just convert the output of `ISOdate` to `POSIXlt` and use the `$mon` element: `as.POSIXlt(ISOdate(2004,12,21))$mon+1`. I have a hard time believing you spent 90 minutes trying to figure this out and didn't get to the part where `ISOdate` returns a time-based class, not a string. Read `?DateTimeClasses`. – Joshua Ulrich Oct 19 '12 at 15:09
  • This is patently not what you wanted as you explicitly stated that `ISOdate()` converts to a string internally which wasn't what you wanted. Also, why are you converting this to get the month that you already have! – Gavin Simpson Oct 19 '12 at 15:11
  • I think that might just have been an example. `as.POSIXlt(Sys.time())$mon+1` might have been a better example (i.e. not cluttering things up with `ISOdate` ...) – Ben Bolker Oct 19 '12 at 15:15
  • PS could we relax the tone here just a little bit on both sides? – Ben Bolker Oct 19 '12 at 15:17
  • @Ben, thanks for pointing out why my example looks strange. I've edited it now to point out that the ISOdate is just a way to get the incoming datetime, as you say. – Hugh Perkins Oct 19 '12 at 15:21
  • CAn one of the downvoters propose a better way? Do you think that `as.numeric(format(...` is better? – Hugh Perkins Oct 19 '12 at 15:23
  • @Joshua, are you saying 'unclass' converts to a string? I just found that on this page http://stat.ethz.ch/R-manual/R-patched/library/base/html/as.POSIXlt.html , but I've no idea what it actually does. The good point is that it tells me the names of the available fields, which I didn't manage to find in any other documentaiton. – Hugh Perkins Oct 19 '12 at 15:29
  • @HughPerkins No, `unclass()` strips the class attribute which affects dispatch on the `print()` function you are implicitly calling when typing the name of the object at the command line. With the class, the print method for the `"POSIXlt"` class is called and the internal representation of the time rendered as a string and printed. When unclassed, the default `print()` method is called which prints the object in its native format which is a list with stated components. – Gavin Simpson Oct 19 '12 at 15:31
  • In answer to your comment about `as.numeric()`, then yes I would find that easier and more intuitive than remembering the POSIX standard etc and which components were 0-based. See my Answer which may not be what you want but it is an alternative. I have also removed my -1 as I see now that I had misunderstood what you were doing and wanted. – Gavin Simpson Oct 19 '12 at 15:33
  • @HughPerkins: No, I was referring to, "apparently it [ISOdate] first converts the number to a string!" in your question. Ben Bolker and I have both proposed a better way. There's no need to `unclass` the `POISXlt` object to access the `mon` vector. Again, this is all described in `?DateTimeClasses`, which `?ISOdate` points you to. – Joshua Ulrich Oct 19 '12 at 15:34
  • @Joshua: `?DateTimeClasses` is quite useful. Thanks! – Hugh Perkins Oct 19 '12 at 15:44
  • At all. Ok, I see I got downvoted for two reasons, one of which is that my writing was not really clear. One of which, perhaps the greater, was the tone of my writing. I've edited out most of the tone (and someone helped me edit out what was left). It's midnight, it's the first time I've used R, I started at 11am this morning, and I should probably take a break and get some sleep! R is very cool by the way. – Hugh Perkins Oct 19 '12 at 15:48
  • 1
    @HughPerkins I had similar issues the first time I used dates in R. (See [my question](http://stackoverflow.com/a/7958336/903061).) Rather than unclassing, you can use the `attributes` function to see the names of the available fields. – Gregor Thomas Oct 19 '12 at 16:05
  • @shujaa. `attributes` is very useful. Thanks! – Hugh Perkins Oct 20 '12 at 06:37