1

I am trying to create a variable that contains a list of all of the column names that are not zero for each row.

Example of data:

set.seed(334)
DF <- matrix(sample(0:9,9),ncol=4,nrow=10)
DF <- as.data.frame.matrix(DF)
DF$id <- c("ty18","se78","first", "gh89", "sil12","seve","aga2", "second","anotherX", "CH560")
DF$count <- rowSums(DF[,2:5]>0)
DF
>      V1 V2 V3 V4       id count
>   1   9  4  0  5     ty18     3
>   2   4  0  5  8     se78     3
>   3   0  5  8  2    first     4
>   4   5  8  2  6     gh89     4
>   5   8  2  6  7    sil12     4
>   6   2  6  7  3     seve     4
>   7   6  7  3  9     aga2     4
>   8   7  3  9  4   second     4
>   9   3  9  4  0 anotherX     3
>   10  9  4  0  5    CH560     3

The desired output would be a new variable that was, for row 1, "V1 V2 V4" and for row 2 "V1 V3 V4". I only want to use the V1-V4 for this, and not consider id or count.

This question on SO helped: For each row return the column name of the largest value

I tried to test this out, but it ignores my selective columns, even for max, so the first test here just gives the max for the whole row, which is not always in V1-V4 in my data.

DF$max <- colnames(DF)[apply(DF[,1:4],1,which.max)]

Despite the error, I think I need to do something like this, but my DF$list attempt is clearly all wrong:

DF$list <- colnames(DF[,1:4]>0)

I'm getting

Error in `$<-.data.frame`(`*tmp*`, "list", value = c("V1", "V2", "V3",  : 
replacement has 4 rows, data has 10

Maybe I'm trying to put a vector into a cell, and that is why it doesn't work, but I don't know how to get this information out and then make it into a string. I also don't understand why the max on selective columns did not work.

Community
  • 1
  • 1
jessi
  • 1,438
  • 1
  • 23
  • 36
  • For anyone who is curious, the solution that @orizon gives for getting the name of the max of a subset of columns in the data frame is here `DF$max <- simplify2array( apply( DF[,1:4], 1, function(x) names(DF[,1:4])[which.max(x)] ) )` – jessi Apr 01 '14 at 19:19

1 Answers1

5

How about this

DF$nonzeros <- simplify2array(
                      apply(
                        DF[1:4], 1, 
                        function(x) paste(names(DF[1:4])[x != 0], collapse = " ")
                      )
                )
orizon
  • 3,159
  • 3
  • 25
  • 30
  • Thanks. Can you tell me why the max did not work the way that I put it? It also does not work with: `DF$max <- simplify2array( apply( DF[1:4], 1, function(x) names(DF[1:4])[max(x)] ) )` – jessi Apr 01 '14 at 18:18
  • Your revision of my answer does not work with ``max'' because this returns the maximum, not the position of the maximum. It does return something helpful if you use which.max instead, but as you found earlier it only returns the maximum not all non-zero entries. – orizon Apr 01 '14 at 18:28
  • Great. Thank you again. The DF$nonzeros was exactly what I needed, and I just wanted to understand the other. I really appreciate your response. – jessi Apr 01 '14 at 18:59
  • @orizon, Your solution was great !! +1 for the one liner. I wanted to check what if I want to find top 5 columns greater than a particular value. So I understand I can give the condition as **DF$top5 <- simplify2array( apply( DF[1:4], 1, function(x) paste(names(DF[1:4])[x > 2], collapse = " ") ) )** But how can I restrict it to return top5 column names only – user1412 Nov 24 '17 at 11:09