-2

I wrote the following function to convert a vector of Strings to a vector of Dates (the code inside the for loop was inspired by this post: R help converting factor to date). When I pass in a vector of size 1000, this takes about 30 seconds. Not terribly slow, but I ultimately need to pass in about 100,000 so this could be a problem. Any ideas why this is slow and/or how to speed it up?

toDate <- function (dates) 
{
    theDates <- vector()
    for(i in 1:length(dates))
    {
        temp <- factor(dates[i])
        temp <- as.Date(temp, format = "%m/%d/%Y")
        theDates[i] <- temp
    }
 class(theDates) <- "Date"
 return(theDates)
}
Community
  • 1
  • 1
pwerth
  • 210
  • 1
  • 6
  • 14
  • reproducible example please? what's wrong with the answer given in the linked question, which is vectorized already??? – Ben Bolker Jun 12 '14 at 17:32
  • I'm not sure I understand your question. The code works, it is just very slow so I'm wondering if there's a better way to accomplish what I'm trying to do. – pwerth Jun 12 '14 at 17:35
  • 5
    You are incurring 100,000 calls to `as.Date()`, one per iteration. That is silly when `as.Date()` is **already** vectorised and thus you could just do `as.Date(dates, format = "%m/%d/%Y")` - you don't need to convert to a factor if they already are one and if not they will be character so doing that conversion to factor incurs further overhead. – Gavin Simpson Jun 12 '14 at 17:36
  • Ahh I see. Thanks! It's running very fast now. – pwerth Jun 12 '14 at 17:41

1 Answers1

3

Just do:

as.Date(dates, format = "%m/%d/%Y")
  1. You don't need to loop over the dates vector as as.Date() can handle a vector of characters just fine in a single shot. Your function is incurring length(dates) calls to as.Date() plus some assignments to other functions, which all have overhead that is totally unnecessary.
  2. You don't want to convert each individual date to a factor. You don't want to convert them at all (as.Date() will just convert them back to characters). If you did want to convert them, factor() is also vectorised, so you could (but you don't need this at all, anywhere in your function) remove the factor() line and insert dates <- as.factor(dates) outside the for() loop. But again, you don't need to do this at all!
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • 1
    and if you want it even faster see http://stackoverflow.com/questions/12786335/why-is-as-date-slow-on-a-character-vector – Ben Bolker Jun 12 '14 at 17:42