Get column name that holds max value if higher than x

Question

I am trying to assign classifications but I'm running into some problems. The normal classification method takes the majority of the votes, but I want to be a bit more strict. Lets say I've got the following matrix:

     c1    c2    c3
x1   0.09  0.7   0.21
x2   0.34  0.33  0.33

If I take the majority of the votes, the classification will be as follows:

     class
x1   c2
x2   c1

But I want to set the threshold to eg 0.40 votes, so that I would get these classifications:

     class
x1   c2
x2   unassigned

I know how to get the max in a row and how to get the column name that holds the max in that row (from this issue, but it doesn't solve mine), but for some reason I can't seem to query the max to be atleast 0.40. Any help would be appreciated :)

Ronak Shah · Accepted Answer · 2020-09-03T14:49:27.370

0

You can use max.col to get maximum value in the row.

cols <- names(df)[max.col(df) * NA^!rowSums(df > 0.4) > 0]
cols[is.na(cols)] <- 'unassigned'
cols
#[1] "c2"         "unassigned"

NA^!rowSums(df > 0.4) > 0 part is to return NA for those rows that have no value > 0.4.

data

df <- structure(list(c1 = c(0.09, 0.34), c2 = c(0.7, 0.33), c3 = c(0.21, 
0.33)), class = "data.frame", row.names = c("x1", "x2"))

edited Sep 03 '20 at 14:49

answered Sep 03 '20 at 14:36

Ronak Shah

377,200
20
156
213

This doesn't work, cols is empty.. – kleurless Sep 03 '20 at 14:47
Updated the data which I used in the answer. This doesn't give me empty cols. – Ronak Shah Sep 03 '20 at 14:49
My data is in a (large) matrix, seems to work only with dataframes? – kleurless Sep 03 '20 at 15:02
If you have matrix, you need to use `colnames` instead of `names`. `cols <- colnames(df)[max.col(df) * NA^!rowSums(df > 0.4) > 0]` – Ronak Shah Sep 03 '20 at 15:06
Oh nice, did not know about that difference. This fix is more straight forward and less lines of code, so thanks mate! – kleurless Sep 04 '20 at 08:35

score 0 · Answer 2 · answered Sep 03 '20 at 14:38

I would suggest this approach with apply():

#Function
myfun <- function(x)
{
  y <- names(x)[which(x==max(x[which(x>0.4)]))]
  y2 <- y[1]
  if(is.na(y2))
  {
    y2 <- 'not assigned'
  }
    
  return(as.character(y2))
}
#Apply
df$Class <- apply(df,1,myfun)

Output:

     c1   c2   c3        Class
x1 0.09 0.70 0.21           c2
x2 0.34 0.33 0.33 not assigned

Get column name that holds max value if higher than x

2 Answers2