1

I have a lot of different dates in one column of my dataframe. I would like to aggregate the data so that only the year is kept; I do not need months and days. Originally the entries were saved as integer. The function as.Date returns nonsense

"0011-06-20"

instead of

"11-06-2000"

So I used as.character.Date and got valid results:

as.character.Date(Training_lowNA$last_swap)
[1] "11/6/2000 "

From these results I now want to erase the day and the month, only keeping the year. Or would it have been easier to do the same with integers?

I would be glad if there was a helpful idea around!

EDIT: my input data has 50,000 entries of dates of the format

[9955] 8/14/2001  5/27/2001  3/16/2001                        4/13/2000 
[9961] 7/1/2000   5/18/2000  8/6/2001   7/17/2000             9/16/2001 
[9967] 10/21/2000 7/24/2001  5/6/2000   12/18/2000                      
[9973] 1/11/2001  7/31/2001                        9/17/2001  3/8/2001  
[9979]            9/30/2000                        7/12/2001  8/20/2000 
[9985]            10/20/2000 9/21/2000  9/27/2000  7/18/2000            
[9991]            10/1/2000                                             
[9997] 9/17/2001  7/22/2001  11/6/2000  5/31/2001 
[ reached getOption("max.print") -- omitted 40000 entries ]

What I would like as output is:

[9955] 2001  2001  2001                        2000 
[9961] 2000   2000 2001   2000             2001 
[9967] 2000 2001  2000   2000                      
[9973] 2001  2001                        2001  2001  
[9979]            2000                        2001  2000 
[9985]            2000 2000  2000  2000            
[9991]            2000                                             
[9997] 2001  2001  2000  2001 

EDIT #2

As David suggested below, I tried his approach:

Training_lowNA[] <- lapply(Training_lowNA, function(x) format(as.Date(x, "%m/%d/%Y"), "%Y")). 

the debug shows:

function (x) 
{
xx <- x[1L]
if (is.na(xx)) {
    j <- 1L
    while (is.na(xx) && (j <- j + 1L) <= length(x)) xx <- x[j]
    if (is.na(xx)) 
        f <- "%Y-%m-%d"
}
if (is.na(xx) || !is.na(strptime(xx, f <- "%Y-%m-%d", tz = "GMT")) || 
    !is.na(strptime(xx, f <- "%Y/%m/%d", tz = "GMT"))) 
    return(strptime(x, f))
stop("character string is not in a standard unambiguous format")

and here comes EDIT #3:

> dput(head(Training_lowNA$last_swap))
structure(c(78L, 32L, 1100L, 1019L, 522L, 265L), .Label = c("", 
"1/1/2000", "1/1/2001", "1/1/2002", "1/10/1999", "1/10/2000", 
"here follow 50,000 entries of this sort", "9/9/2000", "9/9/2001"
), class = "factor")
PikkuKatja
  • 1,101
  • 3
  • 13
  • 21
  • 2
    Could you show your input dataset and expected output? – akrun Mar 04 '15 at 12:17
  • 1
    Is this the same problem? http://stackoverflow.com/questions/13456241/convert-unix-epoch-to-date-object-in-r – Balint Domokos Mar 04 '15 at 12:19
  • @BalintDomokos not exactly, because my data is not in UNIX format... – PikkuKatja Mar 05 '15 at 09:25
  • 1
    Can you give an example integer in your data frame, and an the corresponding date what you want as output? – Balint Domokos Mar 05 '15 at 09:29
  • I just edited the post and hope that this is what you wanted. – PikkuKatja Mar 05 '15 at 09:31
  • 1
    Try `df[] <- lapply(df, function(x) format(as.Date(x, "%m/%d/%Y"), "%Y"))` – David Arenburg Mar 05 '15 at 09:40
  • hey @DavidArenburg, I just tried your approach and I get the following error: "character string is not in a standard unambiguous format" I will edit the post once more showing the exact code. – PikkuKatja Mar 05 '15 at 09:52
  • It works on the data you provided. Can you provide a `dput(head(Training_lowNA))`? – David Arenburg Mar 05 '15 at 10:00
  • since `Training_lowNA` contains about 150 variables, I guess it is better to provide only `dput(head(Training_lowNA$last_swap))` ? – PikkuKatja Mar 05 '15 at 10:09
  • Your `dput` doesn't contain any element which are not `NA`, though the data structure seems OK, I can't imagine why are you getting this error – David Arenburg Mar 05 '15 at 10:18
  • in fact, there are `NA` in the data. Would it help to convert the data beforehand somehow? – PikkuKatja Mar 05 '15 at 10:20
  • This should work with `NA`s too. I can't reproduce your error. – David Arenburg Mar 05 '15 at 10:23
  • well, I would anyways like to thank you for your attempt -- I also learned a lot about how to phrase a question, which info to provide and how to behave in this forum! :-) maybe someone else will find another path that I do not mess up... – PikkuKatja Mar 05 '15 at 10:27
  • Wait, you only try to modify one column?? I thought we talking the whole data set. Just do `Training_lowNA$last_swap <- format(as.Date(Training_lowNA$last_swap, "%m/%d/%Y"), "%Y")` – David Arenburg Mar 05 '15 at 10:28
  • HA! And here we go! It worked amazingly fine. Sadly, I still do not get the difference and why it did not work before. – PikkuKatja Mar 05 '15 at 10:33
  • 1
    Because I was trying to modify all the columns in your data set instead of that one. Actually @A.Val. already provided that answer. Let me just modify it a bit. You can accept it now. – David Arenburg Mar 05 '15 at 10:34

3 Answers3

2

First thing, you need to make proper date object from string:

(a <- as.Date("9/21/2000", "%m/%d/%Y"))
## [1] "2000-09-21"

Then you can extract year with:

format(a, "%Y")
## [1] "2000"

Which combines into one-liner, given you have vector with date:

format(as.Date(df$date, "%m/%d/%Y"), "%Y")
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
statespace
  • 1,644
  • 17
  • 25
1

try using year() function from the lubridate package.

Refer this link

James
  • 163
  • 1
  • 8
  • I did not know the lubridate package at all -- this will come very handy in future! – PikkuKatja Mar 05 '15 at 08:58
  • The package is particulary helpful when you need to, say, add a month to date object. I'm yet to find better way than `%m+% months(x)`. But that's sort of a useful offtopic. – statespace Mar 05 '15 at 12:59
0

The following would do it:

dat <- c("8/14/2001", "5/27/2001", "3/16/2001", "4/13/2000", "7/1/2000", "5/18/2000", "8/6/2001", "7/17/2000", "9/16/2001", "10/21/2000", "7/24/2001", "7/24/1977", "7/24/1999")

ndat <- as.POSIXlt(dat, format="%m/%d/%Y")

as.POSIXlt(ndat)$year + 1900
Ruthger Righart
  • 4,799
  • 2
  • 28
  • 33