43

I have some very simple data in R that needs to have its date format changed:

 date midpoint
1   31/08/2011   0.8378
2   31/07/2011   0.8457
3   30/06/2011   0.8147
4   31/05/2011   0.7970
5   30/04/2011   0.7877
6   31/03/2011   0.7411
7   28/02/2011   0.7624
8   31/01/2011   0.7665
9   31/12/2010   0.7500
10  30/11/2010   0.7734
11  31/10/2010   0.7511
12  30/09/2010   0.7263
13  31/08/2010   0.7158
14  31/07/2010   0.7110
15  30/06/2010   0.6921
16  31/05/2010   0.7005
17  30/04/2010   0.7113
18  31/03/2010   0.7027
19  28/02/2010   0.6973
20  31/01/2010   0.7260
21  31/12/2009   0.7154
22  30/11/2009   0.7287
23  31/10/2009   0.7375

Rather than %d/%m/%Y, I would like it in the standard R format of %Y-%m-%d

How can I make this change? I have tried:

nzd$date <- format(as.Date(nzd$date), "%Y/%m/%d")

But that just cut off the year and added zeros to the day:

 [1] "0031/08/20" "0031/07/20" "0030/06/20" "0031/05/20" "0030/04/20"
 [6] "0031/03/20" "0028/02/20" "0031/01/20" "0031/12/20" "0030/11/20"
 [11] "0031/10/20" "0030/09/20" "0031/08/20" "0031/07/20" "0030/06/20"
 [16] "0031/05/20" "0030/04/20" "0031/03/20" "0028/02/20" "0031/01/20"
 [21] "0031/12/20" "0030/11/20" "0031/10/20" "0030/09/20" "0031/08/20"
 [26] "0031/07/20" "0030/06/20" "0031/05/20" "0030/04/20" "0031/03/20"
 [31] "0028/02/20" "0031/01/20" "0031/12/20" "0030/11/20" "0031/10/20"
 [36] "0030/09/20" "0031/08/20" "0031/07/20" "0030/06/20" "0031/05/20"

Thanks!

Henrik
  • 65,555
  • 14
  • 143
  • 159
indigo
  • 1,067
  • 4
  • 15
  • 26

8 Answers8

77

There are two steps here:

  • Parse the data. Your example is not fully reproducible, is the data in a file, or the variable in a text or factor variable? Let us assume the latter, then if you data.frame is called X, you can do
 X$newdate <- strptime(as.character(X$date), "%d/%m/%Y")

Now the newdate column should be of type Date.

  • Format the data. That is a matter of calling format() or strftime():
 format(X$newdate, "%Y-%m-%d")

A more complete example:

R> nzd <- data.frame(date=c("31/08/2011", "31/07/2011", "30/06/2011"), 
+                    mid=c(0.8378,0.8457,0.8147))
R> nzd
        date    mid
1 31/08/2011 0.8378
2 31/07/2011 0.8457
3 30/06/2011 0.8147
R> nzd$newdate <- strptime(as.character(nzd$date), "%d/%m/%Y")
R> nzd$txtdate <- format(nzd$newdate, "%Y-%m-%d")
R> nzd
        date    mid    newdate    txtdate
1 31/08/2011 0.8378 2011-08-31 2011-08-31
2 31/07/2011 0.8457 2011-07-31 2011-07-31
3 30/06/2011 0.8147 2011-06-30 2011-06-30
R> 

The difference between columns three and four is the type: newdate is of class Date whereas txtdate is character.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • 1
    hmm, this seems extremely complicated for a noob. I ended up just changing the formatting in excel and reading the csv file back into R. I was wanting to know how to easily change it in R in case I had a much bigger file, but this doesn't seem nearly as easy as it should be. That's no slight on your solution, I was just hoping it was much simpler (possibly a way to convert the original column without creating a new one). Is there a way to change the class first and then format it? – indigo Sep 16 '11 at 04:45
  • 3
    @Yuri - That's essentially what Dirk's answer was showing you how to do, though he created some new columns along the way so you can easily see what's happening "under the hood". I recommend walking through his example line by line, inserting a `str(x)` in between each line so you can see the differences in action. – Chase Sep 16 '11 at 11:53
  • 1
    @Chase Thanks I do appreciate the extra column for pedagogical purposes and it helped me see the class difference as well as the format; so thanks for that! Good to know the extra step with the additional column isn't strictly necessary. Thanks to you both! – indigo Sep 17 '11 at 05:50
17
nzd$date <- format(as.Date(nzd$date), "%Y/%m/%d")

In the above piece of code, there are two mistakes. First of all, when you are reading nzd$date inside as.Date you are not mentioning in what format you are feeding it the date. So, it tries it's default set format to read it. If you see the help doc, ?as.Date you will see

format
A character string. If not specified, it will try "%Y-%m-%d" then "%Y/%m/%d" on the first non-NA element, and give an error if neither works. Otherwise, the processing is via strptime

The second mistake is: even though you would like to read it in %Y-%m-%d format, inside format you wrote "%Y/%m/%d".

Now, the correct way of doing it is:

> nzd <- data.frame(date=c("31/08/2011", "31/07/2011", "30/06/2011"), 
+                                       mid=c(0.8378,0.8457,0.8147))
> nzd
        date    mid
1 31/08/2011 0.8378
2 31/07/2011 0.8457
3 30/06/2011 0.8147
> nzd$date <- format(as.Date(nzd$date, format = "%d/%m/%Y"), "%Y-%m-%d")
> head(nzd)
        date    mid
1 2011-08-31 0.8378
2 2011-07-31 0.8457
3 2011-06-30 0.8147
hi15
  • 2,113
  • 6
  • 28
  • 51
8

You could also use the parse_date_time function from the lubridate package:

library(lubridate)
day<-"31/08/2011"
as.Date(parse_date_time(day,"dmy"))
[1] "2011-08-31"

parse_date_time returns a POSIXct object, so we use as.Date to get a date object. The first argument of parse_date_time specifies a date vector, the second argument specifies the order in which your format occurs. The orders argument makes parse_date_time very flexible.

Ben Rollert
  • 1,564
  • 1
  • 13
  • 21
3

This is really easy using package lubridate. All you have to do is tell R what format your date is already in. It then converts it into the standard format

nzd$date <- dmy(nzd$date)

that's it.

3

After reading your data in via a textConnection, the following seems to work:

dat <- read.table(textConnection(txt), header = TRUE)
dat$date <- strptime(dat$date, format= "%d/%m/%Y")
format(dat$date, format="%Y-%m-%d")

> format(dat$date, format="%Y-%m-%d")
 [1] "2011-08-31" "2011-07-31" "2011-06-30" "2011-05-31" "2011-04-30" "2011-03-31"
 [7] "2011-02-28" "2011-01-31" "2010-12-31" "2010-11-30" "2010-10-31" "2010-09-30"
[13] "2010-08-31" "2010-07-31" "2010-06-30" "2010-05-31" "2010-04-30" "2010-03-31"
[19] "2010-02-28" "2010-01-31" "2009-12-31" "2009-11-30" "2009-10-31"

> str(dat)
'data.frame':   23 obs. of  2 variables:
 $ date    : POSIXlt, format: "2011-08-31" "2011-07-31" "2011-06-30" ...
 $ midpoint: num  0.838 0.846 0.815 0.797 0.788 ...
Chase
  • 67,710
  • 18
  • 144
  • 161
  • I do have a question about POSIXlt in data frames as is referenced here: http://stackoverflow.com/questions/3355107/possibly-inconsistent-behavior-in-qplot When I try to plot the date as an x axis in ggplot, I receive this error -- Error in if (length(range) == 1 || diff(range) == 0) { : missing value where TRUE/FALSE needed -- How would I get this in POSIXct? – indigo Sep 17 '11 at 08:27
  • @Yuri - something like `as.POSIXct(otherStuffHere)` will probably work. Thanks for that link, I wasn't aware of those issues raised by Hadley. – Chase Sep 17 '11 at 12:48
2

Using one line to convert the dates to preferred format:

nzd$date <- format(as.Date(nzd$date, format="%d/%m/%Y"),"%Y/%m/%d")
1

I believe that

nzd$date <- as.Date(nzd$date, format = "%d/%m/%Y")

is sufficient.

joran
  • 169,992
  • 32
  • 429
  • 468
0

If your input have a consistent date format, you may try a simple workaround:

sapply(date, function(x){paste(strsplit(x, '/')[[1]][c(3,2,1)], collapse = '/')})
ae2487
  • 81
  • 3
  • To deal with NAs, you may need: `sapply(date, function(x){ifelse(is.na(x),NA,paste(strsplit(x, '/')[[1]][c(3,2,1)], collapse = '/'))})` – ae2487 Jun 01 '23 at 13:21