Subset rows with highest value from multiple columns

Question

I have this kind of issue: I would like to assign a char value to each row based on the highest value within different columns. I'll try to explain better:

      col1 col2 col3 col4
row1  10   20   30
row2  20   10   10
row3  30   10   10

I would like to assign to col4 as a factor the name of the column with the highest value within the col1:col3 interval for each row of the Data Frame. For example for row1 I want in the col4 the name "col3", for row2 the name "col1" and so on.

Is there a simple way to do this in R using dplyr or other basic functions?

In case of a tie I would like to chose first column in alphabetical order. — , Jun 28 '16 at 21:23

Mike H. · Accepted Answer · 2016-06-28T21:41:24.200

You don't actually even need to use apply. You can simply do an assignment:

df$col4 <- names(df)[max.col(df,ties.method="first")]

This also takes care of your which column to choose in a tie issue. To show how this works we can modify the original data slightly:

df <-read.table(header=T,text="   col1 col2 col3
  10   20   20
  20   10   10
  30   10   10")

Note how for the first row col2 and col3 have the same value. Per your comment on using the alphabetically first column in a tie, we would want col2.

Running the above code we see it works as expected:

 col1 col2 col3 col4
1   10   20   20 col2
2   20   10   10 col1
3   30   10   10 col1

Edit: In case you don't only have cols1 - cols3 in your dataset (there might be other variables you dont want to take the max of) a more complete solution would be to specify the columns you want to take the max of. Something like this:

Data:

#Note this has an extra id variable on it
df <-read.table(header=T,text="   col1 col2 col3 id
  10   20   20 100
  20   10   10 100
  30   10   10 100")

Then you can run the code to see how it works:

#The columns we want to take the max of
cols <- c("col1","col2","col3")
df$col4 <- names(df)[names(df) %in% cols][max.col(df[,cols],ties.method="first")]


  col1 col2 col3  id col4
1   10   20   20 100 col2
2   20   10   10 100 col1
3   30   10   10 100 col1

Perfect! This works, just one last question. Is there a way to select only a certain number of columns on which apply this function? If I have 10 columns in total and I want to count which column has the maximum value but only for columns 1:5, how can I write this subset? — , Jun 29 '16 at 07:50
Did you see my edit? You could specify the names of the columns you want that way. — Mike H., Jun 29 '16 at 12:37

score 0 · Answer 2 · answered Jun 28 '16 at 21:24

0

You can do this with which.max and names (or colnames if this is a matrix):

# Data from example
dat <- data.frame(col1 = c(10, 20, 30), col2 = c(20, 10, 10), col3 = c(30, 10, 10))

dat$col4 <- apply(dat, 1, function(x) names(dat)[which.max(x)])

answered Jun 28 '16 at 21:24

Jake Fisher

3,220
3
26
39

Subset rows with highest value from multiple columns

2 Answers2