0

I use following code to map the month name to number and I find it's low efficiency compared to other dataframe computation without for loop.

Sys.time()
head(df[,4])
for (i in 1:nrow(df)){
  df$monthnum[i]<-match(tolower(as.character(df[i,4])), tolower(month.name))
}
Sys.time()

and I got output like this:

    > Sys.time()
[1] "2016-03-07 19:20:53 CST"
> dim(df)
[1] 229464      6
> head(df[,4])
[1] January January January January January January
Levels: April August December February January July June March May November October September
> for (i in 1:nrow(df)){
+   df$monthnum[i]<-match(tolower(as.character(df[i,4])), tolower(month.name))
+ }
> Sys.time()
[1] "2016-03-07 19:23:23 CST"

Can anyone the logic of for loop in dataframe. Any information will be appreciated.

xiaojie.wu
  • 141
  • 1
  • 1
  • 4
  • 1
    Maybe [this](http://stackoverflow.com/questions/34822719/why-is-the-time-complexity-of-this-loop-non-linear) helps explain why looping with data frames is so inefficient. Your code is just `df$monthnum <- match(tolower(as.character(df[,4], tolower(month.name))` – Martin Morgan Mar 07 '16 at 11:37

1 Answers1

0

Use sapply function. First, create your function:

my_function = function(my_month){
  match(tolower(as.character(my_month)), tolower(month.name))
}

then use sapply

sapply(df[,4],my_function)
Diego Aguado
  • 1,604
  • 18
  • 36
  • It's helpful and it only takes several seconds to finish, did the loop in dataframe increase the complexity. – xiaojie.wu Mar 07 '16 at 12:48
  • Sorry, I dont get the second part of your comment. If that solved your problem you can click on the tick to choose it as answer. – Diego Aguado Mar 07 '16 at 12:53