0

I have not had experience with using dates in R. I have read all of the docs but I still can't figure out why I am getting this error. I am trying to take a vector of strings and convert that into a vector of dates, using some specified format. I have tried both using for loops and converting each date indicidually, or using vector functions like sapply, but neither is working. Here is the code using for loops:

dates = rawData[,ind] # get vector of date strings
print("single date example")
print(as.Date(dates[1]))
dDates = rep(1,length(dates)) # initialize vector of dates
class(dDates)="Date"
for (i in 1:length(dates)){
    dDates[i]=as.Date(dates[i])
}
print(dDates[1:10])

EDIT: info on "dates" variables

[1] "dates"
V16          V17          V18          V19          V36               
[1,] "2014-01-16" "2014-01-30" "2014-01-16" "2014-01-17" "1999-03-16 12:00"
[2,] "2014-01-04" "2014-01-18" "2014-01-04" "2014-01-08" "1998-09-04 12:00"
[3,] "2014-03-05" "2014-03-19" "2014-03-05" "2014-03-07" "1996-09-30 05:00"
[4,] "2014-01-21" "2014-02-04" "2014-01-22" "2014-01-24" "1995-08-21 12:00"
[5,] "2014-01-07" "2014-01-21" "2014-01-07" "2014-01-09" "1994-04-07 12:00"
[1] "class(dates)"
[1] "matrix"
[1] "class(dates[1,1])"
[1] "character"
[1] "dim(dates)"
[1] 56557     8

The result I am getting is as follows:

[1] "single date example"
[1] "2014-01-16"
Error in charToDate(x) : 
 character string is not in a standard unambiguous format

So basically, when I try to parse a signle element of the date string into a date, it works fine. But when I try to parse the dates in a loop, it breaks. How could this be so?

The reason why I am using a loop instead of sapply is because that was returning an even stranger result. When I try to run:

dDates = sapply(dDates, function(x) as.Date(x, format = "%Y-%m-%d"))

I am getting the following output:

2014-01-16 2014-01-04 2014-03-05 2014-01-21 2014-01-07 2014-01-02 2014-01-08 
    NA         NA         NA         NA         NA         NA         NA 
2014-02-22 2014-01-09 2014-02-22 
    NA         NA         NA 

Which is very strange. As you can see, since my format was correct, it was able to parse out the dates. But for some reason, it is also giving a time value of NA (or at least that is what I think the NA means). Maybe this is happening because some of my date strings have times, while others don't. But the thing is I left the time out of the format because I don't care about time.

Does anyone know why this is happening or how to fix it? I can't find anywhere online where you can "set" the time value of a date object easily -- I just can't seem to get rid of that NA. And somehow even a for loop doesn't work! Either was, the output is strange and I am not getting the expected results, even though my format is correct. Very frustrating that a simple thing like parsing a vector of dates is so much more difficult than in matlab or java.

Any help please?

EDIT: when I try simply

dDates = as.Date(dates,format="%m/%d/%Y")

I get the output

"dDates[1:10]"
[1] NA NA NA NA NA NA NA NA NA NA

still those mysterious NA's. I am also getting an error

Error in as.Date.default(value) : 
do not know how to convert 'value' to class “Date”
Paul
  • 1,106
  • 1
  • 16
  • 39
  • 3
    Please supply `dates` using `dput` to make this [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Thomas Jun 24 '14 at 14:33
  • 1
    and you shouldn't even need `sapply` ... – Ben Bolker Jun 24 '14 at 14:39
  • Your data is probably a factor, not a character vector. – Andrie Jun 24 '14 at 14:41
  • 1
    so try `as.Date(as.character(ddate), format="%Y-%m-%d")` – Ben Bolker Jun 24 '14 at 15:40
  • thank you for all of your comments. I have edited the question to show the value and class structure of "dates", while dDates is defined in the code above. – Paul Jun 24 '14 at 16:27
  • I see some of your dates also have associated times ... – Ben Bolker Jun 24 '14 at 16:43
  • @Ben Bolker, yes I noticed that as well. That is actually one of my main questions... why is there a time if I never specified it in my format? I looked everywhere and I can't seem to find a way around this... either using something like a datemidnight class or by setting the time, but I can't find anything like that. If I could change the times from NA to 00:00:00, that would fix my issue – Paul Jun 24 '14 at 17:13
  • as.Date(as.character(ddate), format="%Y-%m-%d") still gives me a vector of NA's – Paul Jun 24 '14 at 17:34
  • Wait, I was using as.Date(as.character(ddate), format="%m-%d-%Y") in the previous comment, not as.Date(as.character(ddate), format="%Y-%m-%d"). Whe I try that I am getting rid of the NA's! (though still not the error). I thought the format was supposed to be that of the input string, not the output? – Paul Jun 24 '14 at 17:44

1 Answers1

0

Using a subset of your data,

v <- c("2014-01-16", "2014-01-30", "2014-01-16", "2014-01-17", "1999-03-16 12:00")

these statements are equivalent, since your format is the default one:

as.Date(v)
[1] "2014-01-16" "2014-01-30" "2014-01-16" "2014-01-17" "1999-03-16"
as.Date(v, format = "%Y-%m-%d")
[1] "2014-01-16" "2014-01-30" "2014-01-16" "2014-01-17" "1999-03-16"

If you would like to format the output of your date, use format:

format(as.Date(v), format = "%m/%d/%Y")
[1] "01/16/2014" "01/30/2014" "01/16/2014" "01/17/2014" "03/16/1999"
tonytonov
  • 25,060
  • 16
  • 82
  • 98