Replacing for-loops in R

Question

I am trying to stop using for loops when I code but I have a bit of a problem representing a simple operation.

Let's say I am trying to do simple nearest-neighbour estimation on a dataset for a company that owns several restaurants. I have three features: City, Store, Month and one target function Sales. City,Store and Month are all represented with numbers: City takes values between 1-100, Store takes values between 1-50 and Month between 1-12.

Now, I want to replace this for-loop with an apply function:

for (c in 1:100){
 for (s in 1:50){
  for (m in 1:12){
   dat1$Sales[dat1$City==c & dat1$Store==s & dat1$Month==m & is.na(dat1$Sales)] <-
    mean(dat1$Sales[dat1$City==c & dat1$Store==s & dat1$Month==m & !is.na(dat1$Sales)])
  }
 }
}

What is the complexity of this apply function?

Many thanks!

Using a data.table, `dat1[,Sales:=Sales[!is.na(Sales)],by='c,s,m']` — Frank, Apr 11 '14 at 18:45
Please review [how to make a great reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). A `dput()` of some of the data frame would be cool to help folks help you. — hrbrmstr, Apr 11 '14 at 18:48
What is this doing? At first glance it just looks like you are removing the rows where sales is NA. Can you show us a little more of the data.frame? — John Paul, Apr 11 '14 at 19:02
Sorry I wasn't clear - I updated the question. I'm replacing each missing item with average of similar items. I didn't include snapshots of the dataset because it seems very easy and trivial. But I'm guessing it's not... — thecheech, Apr 11 '14 at 19:06
Ah, I didn't notice the `mean`. In that case, `dat1[,Sales:=mean(Sales[!is.na(Sales)]),by='c,s,m']` — Frank, Apr 11 '14 at 19:30
@Frank will that replace all of the `Sales` values with the mean or only those `Sales` which are `NA` ? — John Paul, Apr 11 '14 at 19:38
@JohnPaul Oh you are right, hmm. One probably inefficient way would be `dat1[,Sales:={mm=mean(Sales,na.rm=TRUE); ifelse(is.na(Sales),mm,Sales)},by='c,s,m']` I think — Frank, Apr 11 '14 at 19:47

John Paul · Answer 1 · 2014-04-11T19:54:12.390

Try using aggregate. It has a formula like interface that makes it easy to get the results of a function applied on parts of a data.frame. Then just assign the result to the place in dat1 that needs it.

TempOut<- aggregate(Sales~City+Store+Month, FUN=mean,data=dat1)

dat1$Sales[is.na(dat1$Sales),]<-TempOut[TempOut$City==[dat1[is.na(dat1$Sales),]$City 
& TempOut$Store==[dat1[is.na(dat1$Sales),]$Store & TempOut$Month==
[dat1[is.na(dat1$Sales),]$Month,]$Sales

You could combine the creation of TempOut and assignment to dat1$Sales into one line, but that would have made this even harder to read. I don't have your data so I can't test this - but this should get you on the right track, even if there is a typo in there.

score 3 · Answer 2 · edited May 23 '17 at 11:51

3

Here's a data.table way:

require(data.table)
setDT(dat1)

dat1[, Sales:={
  m=mean(Sales,na.rm=TRUE)
  replace(Sales, is.na(Sales), m)
},by=.(City, Store, Month)]

It would be nice to have something like Sales[is.na(Sales)]:=..., but this is just a feature request right now. Here is a similar question.

edited May 23 '17 at 11:51

Community

1
1

answered Apr 11 '14 at 20:06

Frank

66,179
8
96
180

Replacing for-loops in R

2 Answers2