3

I am a recent convert to R and am struggling to find the R equivalent of the following: looping over variables named with a common prefix plus a number (var1, var2, ..., varn).

Say I have a dataset where each row is a store and each column is the value of that store's revenue in month 1, month 2...month 6. Some made-up data for example:

store = c("a", "b", "c", "d", "c")
rev1 = c(500, 200, 600, 400, 1200) 
rev2 = c(260, 100, 450, 45, 1300)
rev3 = c(500, 150, 610, 350, 900)
rev4 = c(480, 200, 600, 750, 1000)
rev5 = c(500, 68, 750, 350, 1200)
rev6 = c(510, 80, 1000, 400, 1450)
df = data.frame(store, rev1, rev2, rev3, rev4, rev5, rev6) 

I am trying to do something like the following:

varlist <- paste("rev", 1:6)  #create list of variables rev1-rev6 #
for i in varlist {
      highrev[i] <- ifelse(rev[i] > 500, 1, 0) 
}

So for each existing variable rev1:rev6, create a variable highrev1:highrev6 which equals 1 if rev1:rev6 > 500 and 0 otherwise.

Can you suggest an appropriate means of doing this?

Kat
  • 507
  • 1
  • 8
  • 15

3 Answers3

5

In R, we usually don't use loops for such operations. You could simply do:

df[paste0("highrev", 1:6)] <- (df[paste0("rev", 1:6)] > 500) + 0
df
#   store rev1 rev2 rev3 rev4 rev5 rev6 highrev1 highrev2 highrev3 highrev4 highrev5 highrev6
# 1     a  500  260  500  480  500  510        0        0        0        0        0        1
# 2     b  200  100  150  200   68   80        0        0        0        0        0        0
# 3     c  600  450  610  600  750 1000        1        0        1        1        1        1
# 4     d  400   45  350  750  350  400        0        0        0        1        0        0
# 5     c 1200 1300  900 1000 1200 1450        1        1        1        1        1        1
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
4

setup

varlist  <- paste0("rev",1:6)      # note that this is paste0, not paste
hvarlist <- paste0("hi",varlist)

data.table solution. There is a nice way to do this in data.table:

require(data.table)
setDT(df)[,(hvarlist):=lapply(.SD,function(x)1L*(x>500)),.SDcols=varlist]
#    store rev1 rev2 rev3 rev4 rev5 rev6 hirev1 hirev2 hirev3 hirev4 hirev5 hirev6
# 1:     a  500  260  500  480  500  510      0      0      0      0      0      1
# 2:     b  200  100  150  200   68   80      0      0      0      0      0      0
# 3:     c  600  450  610  600  750 1000      1      0      1      1      1      1
# 4:     d  400   45  350  750  350  400      0      0      0      1      0      0
# 5:     c 1200 1300  900 1000 1200 1450      1      1      1      1      1      1

The dplyr package is also designed with this sort of operation in mind...but simply cannot do it.


A bad alternative. Here's another way, hewing closely to the OP's loop:

within(df,{for(i in 1:6) assign(hvarlist[i],1L*(get(varlist[i]) > 500));rm(i)})
#   store rev1 rev2 rev3 rev4 rev5 rev6 hirev6 hirev5 hirev4 hirev3 hirev2 hirev1
# 1     a  500  260  500  480  500  510      1      0      0      0      0      0
# 2     b  200  100  150  200   68   80      0      0      0      0      0      0
# 3     c  600  450  610  600  750 1000      1      1      1      1      0      1
# 4     d  400   45  350  750  350  400      0      0      1      0      0      0
# 5     c 1200 1300  900 1000 1200 1450      1      1      1      1      1      1

You can't assign to dynamic variable names with hvarlist[i] <- ...; this is done instead with assign(hvarlist[i],...), but using the latter is not a good habit. Similarly, get must be used to grab a variable on the basis of a string containing its name.

Community
  • 1
  • 1
Frank
  • 66,179
  • 8
  • 96
  • 180
1

If you want to keep the loop, you could try this

store = c("a", "b", "c", "d", "c")
rev1 = c(500, 200, 600, 400, 1200) 
rev2 = c(260, 100, 450, 45, 1300)
rev3 = c(500, 150, 610, 350, 900)
rev4 = c(480, 200, 600, 750, 1000)
rev5 = c(500, 68, 750, 350, 1200)
rev6 = c(510, 80, 1000, 400, 1450)
df = data.frame(store, rev1, rev2, rev3, rev4, rev5, rev6)

You don't need the ifelse like David points out since > is vectorized and will work on the entire data frame

df[, -1] > 500

#       rev1  rev2  rev3  rev4  rev5  rev6
# [1,] FALSE FALSE FALSE FALSE FALSE  TRUE
# [2,] FALSE FALSE FALSE FALSE FALSE FALSE
# [3,]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
# [4,] FALSE FALSE FALSE  TRUE FALSE FALSE
# [5,]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

Here is your loop slightly amended

for (i in 1:6) {
  x <- paste0('rev', i)
  y <- paste0('highrev', i)
  df[, y] <- (df[, x] > 500) + 0L
}

#   store rev1 rev2 rev3 rev4 rev5 rev6 highrev1 highrev2 highrev3 highrev4 highrev5 highrev6
# 1     a  500  260  500  480  500  510        0        0        0        0        0        1
# 2     b  200  100  150  200   68   80        0        0        0        0        0        0
# 3     c  600  450  610  600  750 1000        1        0        1        1        1        1
# 4     d  400   45  350  750  350  400        0        0        0        1        0        0
# 5     c 1200 1300  900 1000 1200 1450        1        1        1        1        1        1
rawr
  • 20,481
  • 4
  • 44
  • 78
  • 1
    @Frank but without it, my answer is not reproducible. I **hate** it when someone answers a question and I cannot copy/paste to get their result. – rawr May 27 '15 at 22:00
  • Okay, I guess it's a matter of taste; I always run the OP's example before trying questions. – Frank May 27 '15 at 22:06