How to get names of certain variables in a column in R?

Question

I have a data frame that looks like:

    ID   CO1   CO2   ED1   ED2   max
    1     1     2     1     3     3
    2     1     3     3     2     3 
    3     4     2     2     1     3
    4     3     3     4     4     4
    ...
    10    1     1      1     1    1

How do I get R to give me the name(s) of the columns that contain a particular number contanined in the colum max and assign them to a new column, named “best”?

I want something like this:

    ID     CO1   CO2    ED1   ED2    max     best
    1       1     2      1     3      3       ED2         
    2       1     3      3     2      3       CO2
    3       4     2      2     1      4       CO1
    4       3     3      4     4      4       ED1
    ...
    10      1     1      1     1      1       CO2

In case there are more values equal to the one contained in the max column (as for example in row 2 or row 10), one at random is fine.

I have seen several solution to problems similar to this one, but none that effectively works in my case.

Something like `apply(df, 1, function(x) names(df)[which(x[2:5] == x[6])][1])` (not checked for errors) and assign it to the column. — Oliver, Apr 07 '21 at 07:26

Ronak Shah · Answer 1 · 2021-04-07T07:48:45.787

2

You can use max.col :

cols <- grep('CO|ED', names(df), value = TRUE)
df$best <- cols[max.col(df[cols] == df$max)]
df

#  ID CO1 CO2 ED1 ED2 max best
#1  1   1   2   1   3   3  ED2
#2  2   1   3   3   2   3  CO2
#3  3   4   2   2   1   4  CO1
#4  4   3   3   4   4   4  ED1
#5 10   1   1   1   1   1  ED2

You can check ties.method in ?max.col to get first/last match in each row.

data

df <- structure(list(ID = c(1L, 2L, 3L, 4L, 10L), CO1 = c(1L, 1L, 4L, 
3L, 1L), CO2 = c(2L, 3L, 2L, 3L, 1L), ED1 = c(1L, 3L, 2L, 4L, 
1L), ED2 = c(3L, 2L, 1L, 4L, 1L), max = c(3L, 3L, 4L, 4L, 1L)), 
row.names = c(NA, -5L), class = "data.frame")

edited Apr 07 '21 at 07:48

answered Apr 07 '21 at 07:34

Ronak Shah

377,200
20
156
213

"one at random is fine" so this answer always picks the first, right? Maybe add random bit, too? – zx8754 Apr 07 '21 at 07:46
Updated the answer to include random column name from the matches. Thanks. – Ronak Shah Apr 07 '21 at 07:52

score 0 · Answer 2 · answered Apr 07 '21 at 07:35

No need to be overly fancy:


d <- read.table(text=
"    ID   CO1   CO2   ED1   ED2   max
    1     1     2     1     3     3
    2     1     3     3     2     3
    3     4     2     2     1     3
    4     3     3     4     4     4
    10    1     1      1     1    1
", header=TRUE )

max.columns <- d %>% select(matches("CO|ED")) %>%
    apply( 1, which.max )

d$best <- colnames(d)[ max.columns+1 ]

d

Outputs:


> d
  ID CO1 CO2 ED1 ED2 max best
1  1   1   2   1   3   3  ED2
2  2   1   3   3   2   3  CO2
3  3   4   2   2   1   3  CO1
4  4   3   3   4   4   4  ED1
5 10   1   1   1   1   1  CO1

score 0 · Answer 3 · answered Apr 07 '21 at 11:39

Long Base R solution with "best" vector containing the names of all of the best vectors:

# Store as a variable the names of the raw data vectors:
# dvecs => character vector
dvecs <- setdiff(names(df), c("ID", "max"))

# Store a matrix of booleans denoting if the column contains the max value:
# bool_test => logical matrix
bool_test <- df$max == df[,dvecs]

# Store a vector containing the names of the columns with the max values:
# best => character vector
df$best <- apply(
  data.frame(
    vapply(
      seq_along(dvecs),
      function(i) {
        ifelse(bool_test[, i], dvecs[i], NA_character_)
      },
      character(nrow(bool_test))
    )
  ), 
  1, 
  function(x) {
    paste0(na.omit(x), collapse = ", ")
  }
)

How to get names of certain variables in a column in R?

3 Answers3