5

Most pro R users have advised me never to use loops in R. Use apply functions instead. The problem is that it is not that intuitive to write an apply equivalent for every for/while loop if you're not familiar with functional programming. Take the below example for instance.

F <- data.frame(name = c("a", "b", "c", "d"), var1 = c(1,0,0,1), var2 = c(0,0,1,1),  
var3 = c(1,1,1,1), clus = c("one", "two", "three", "four"))
F$ObjTrim <- ""
for (i in 1:nrow(F))
{
 for (j in 2:(ncol(F)-1))
{
 if(F[i, j] == 1) 
 {F$ObjTrim[i]  <- paste(F$ObjTrim[i], colnames(F)[j], sep = " ") }

 }
  print(i)
}

The objective here is to create a variable "ObjTrim" that takes the value of all the column names that have a value == 1. Can some one suggest a good apply equivalent to this?

The code above for example will give :

 name var1 var2 var3  clus         ObjTrim
1    a    1    0    1   one       var1 var3
2    b    0    0    1   two            var3
3    c    0    1    1 three       var2 var3
4    d    1    1    1  four  var1 var2 var3

Thanks!

agstudy
  • 119,832
  • 17
  • 199
  • 261
Shreyes
  • 3,601
  • 1
  • 17
  • 16
  • 1
    The question I am compelled to ask is *why* do you want a vector of column names for each row? Are you planning on using this to do some subsetting after? Or is that actually the desired output? BTW `for` loops do have their uses. – Simon O'Hanlon Jun 09 '13 at 07:52
  • The vector "Objtrim" would have the column names separated by a spaces. So I can visually insspect them to see what are the variables that have values equal to one. And in this case my dataframe has more than 1000 variables. – Shreyes Jun 09 '13 at 07:55
  • Also, Correct me if I am wrong, can "all" the loops in R be written in a functional form of apply? – Shreyes Jun 09 '13 at 07:58
  • 1
    I'd say that almost all `for` loops can be avoided, by using either `apply` loops, built-in functions, or vectorisation. – Paul Hiemstra Jun 09 '13 at 07:59
  • Please help us help you by providing us with a reproducible example (i.e. code and example data), see http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for details. – Paul Hiemstra Jun 09 '13 at 08:00
  • @PaulHiemstra In general yes, but don't automatically write off `for` loops. They have their uses. Here are two examples of where `for` loops were the *best* solution... [**here**](http://stackoverflow.com/a/16271419/1478381) and [**also here**](http://stackoverflow.com/a/8382486/1478381) – Simon O'Hanlon Jun 09 '13 at 09:00
  • @SimonO101 I totally agree, but the fact still remains that in the majority of cases there are *not* the best solution. – Paul Hiemstra Jun 09 '13 at 10:20
  • To learn how to write good R code you should read good R code. I think you should start browsing through the R source, find some interesting package, study it, and notice when they use loops and when they don't. – Steve Weston Jun 09 '13 at 11:56

3 Answers3

5

Here you can avoid for loops using vectorization: colSums is vectorized and is basically used here to convert a vector c(TRUE,FALSE) to 0 or 1.

 colnames(F)[colSums(F==1) != 0] ## create 

Here is a test using my reproducible example:

set.seed(1234)
## create matrix 2*10
F <- matrix(sample(c(1:5),20,rep=TRUE),nrow=2,
            dimnames = list(c('row1','row2'),paste0('col',1:10)))

#        col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
# row1    1    4    5    1    4    4    2    2    2     1
# row2    4    4    4    2    3    3    5    5    2     2
colnames(F)[colSums(F==1) != 0]
"col1"  "col4"  "col10"

PS: Generally it is easy to replace for loops by an "R style solution", but there are some cases where it is difficult/impossible to do that specially when there is recursion.

EDIT

After OP's clarification , here is an apply solution :

F$ObjTrim <- apply(F,1,function(x) paste(colnames(F)[x==1],collapse=' '))

 name var1 var2 var3  clus        ObjTrim
1    a    1    0    1   one      var1 var3
2    b    0    0    1   two           var3
3    c    0    1    1 three      var2 var3
4    d    1    1    1  four var1 var2 var3
NelsonGon
  • 13,015
  • 7
  • 27
  • 57
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • I want this to come for each row. The above code gives me the column names of all the variables that have a values "1" anywhere is the column. I would want to perform this operation for each row and have a vector that gives me the value of all the variables that have a value "1" in that row. – Shreyes Jun 09 '13 at 08:19
  • If you want more specific advice, give us a reproducible example, i.e. data + code. – Paul Hiemstra Jun 09 '13 at 08:35
5

As your comment to @agstudy's answer says that you do want this for each row, maybe this helps you:

df <- F [, 2:4]
df
#   var1 var2 var3
# 1    1    0    1
# 2    0    0    1
# 3    0    1    1
# 4    1    1    1

ones <- which (df == 1, arr.ind=TRUE)
ones
#      row col
# [1,]   1   1
# [2,]   4   1
# [3,]   3   2
# [4,]   4   2
# [5,]   1   3
# [6,]   2   3
# [7,]   3   3
# [8,]   4   3

This you can aggregate by row:

aggregate (col ~ row, ones, paste)
#   row     col
# 1   1    1, 3
# 2   2       3
# 3   3    2, 3
# 4   4 1, 2, 3

If you insist on having the colnames instead of indices, replace the cols in ones first:

ones <- as.data.frame (ones) 
ones$col <- colnames (df)[ones$col]
aggregate (col ~ row, ones, paste)
#   row              col
# 1   1       var1, var3
# 2   2             var3
# 3   3       var2, var3
# 4   4 var1, var2, var3

Of course, you could also use apply along the rows:

apply (df, 1, function (x) paste (colnames (df) [x == 1], collapse = " "))
# [1] "var1 var3"       "var3"             "var2 var3"       "var1 var2 var3"

For your problem, vectorized functions exist so neither for loops nor apply are needed.

However, there are cases where for loops are the clearer (faster to read) and sometimes also the faster to compute alternative. This is particularly then the case when looping a few times allows to use vectorized functions and save applying some other function over a large margin.

cbeleites unhappy with SX
  • 13,717
  • 5
  • 45
  • 57
4

To answer what seems to be your generic question instead of the example you cited --- how to convert a for loop into an apply variant --- the following may be a few useful pointers:

  1. Consider the structure of the object that you are iterating over. There may be different types, for example:

    a) Elements of a vector / matrix. b) Rows / Columns of a matrix. c) A dimension of a higher dimensional array. d) Elements of a list (which within themselves may be one of the objects cited above). e) Corresponding elements of multiple lists / vectors.

    In each case, the function you employ may be slightly different but the strategy to use is the same. Moreover, learn the apply family. The various *pply functions are based on similar abstraction but differ in what they take as input and what they throw as output.

  2. In the above case-list, for example.

    a) Elements of a vector: Look for already existing vectorized solutions (as given above) which are a core strength in R. On top of that consider matrix algebra. Most problems that seem to require loops (or nested loops) can be written as equations in matrix algebra.

    b) Rows / Columns of a matrix: Use apply. Use the correct value for the MARGIN argument. Similary for c) for higher dimensional arrays.

    d) Use an lapply. If the output you return is a 'simple' structure (a scalar or a vector), you may consider sapply which is simply simplify2array(lapply(...)) and returns an array in the appropriate dimensions.

    e) Use mapply. The 'm' can stand for multivariate apply.

  3. Once you have understood the object you are iterating over and the corresponding tool, simplify your problem. Think not of the overall object you are iterating over but one instance of it. For example when iterating over rows of a matrix, forget about the matrix and remember only the row.

    Now, write a function (or a lambda) that operates on only the one instance (element) of your iterand and simply `apply' it using the correct member of the *pply family.

Now, let's take your example problem to use this strategy and replicate the clean solution given by @agstudy.

  1. The first thing to identify is that you are iterating over the rows of the matrix. Clearly, you understand this as your looping solution starts with for (i in 1:nrow(F)).

  2. Identify apply as your friend.

  3. Understand what you need to do with this row. First of all you want to find out which values are 1. Then you need to find the colnames of these values. And then find a way to concatenate these colnames. If I may take the liberty of rewriting @agstudy's solution to help explain:

    process.row <- function (arow) {
      ones <- arow == 1 # Returns logical vector.
      cnames <- colnames[ones] # Logical subsetting.
      cnames <- paste(cnames, collapse=' ') # Paste the names together.
      cnames # Return
    }
    

    And you get the solution:

    F$ObjTrim = apply(X=F, MARGIN=1, FUN=process.row)
    

    Then, when thinking like this becomes instinctive, you can roll out use R's capability to write dense expressions such as:

    F$ObjTrim = apply(F,1,function(x) paste(colnames(F)[x==1],collapse=' '))
    

which uses a 'lambda' rolled on-the-fly to get the job done.

asb
  • 4,392
  • 1
  • 20
  • 30