Get column name for first occurence of most frequent value in a row

Question

I have a data frame that looks like the following:

week_0 <- c(5,0,1,0,0,1)
week_1 <- c(5,0,4,0,2,1)
week_2 <- c(5,0,4,0,8,1)
week_3 <- c(5,0,4,0,8,3)
week_4 <- c(1,0,4,0,8,3)
week_5 <- c(1,0,4,0,8,3)
week_6 <- c(1,0,4,0,1,3)
week_7 <- c(1,0,4,0,1,3)
week_8 <- c(1,0,6,0,3,4)
week_9 <- c(2,4,6,7,3,4)
week_10 <- c(2,4,6,7,3,4)
Participant <- c("Lion","Cat","Dog","Snake","Tiger","Mouse")
test_data <- data.frame(Participant,week_0,week_1,week_2,week_3,week_4,week_5,week_6,week_7,week_8,week_9,week_10)

> test_data

    Participant week_0 week_1 week_2 week_3 week_4 week_5 week_6 week_7 week_8 week_9 week_10
1        Lion      5      5      5      5      1      1      1      1      1      2       2
2         Cat      0      0      0      0      0      0      0      0      0      4       4
3         Dog      1      4      4      4      4      4      4      4      6      6       6
4       Snake      0      0      0      0      0      0      0      0      0      7       7
5       Tiger      0      2      8      8      8      8      1      1      3      3       3
6       Mouse      1      1      1      3      3      3      3      3      4      4       4

I would like to identify the value in a row that appears more than other value. For example, for the first row the value is 1. And the output I want to return is week_4 for the first row. For the second row the value that appears more than other is 0. And the output I want to return is week_0, etc. So the end result should be: week_4, week_0, week_1, week_0, week_2, week_3. I have to use:

apply(test_data, 1, function(x) names(which.max(table(x))))

but I do not get the result that I'm searching for. Any suggestions on how to do this?

Why exactly would the first row return week 4, if there are other weeks that also have 1? You want to return the position of the first occurence of the most common value? — camille, Mar 17 '23 at 01:51
@camille Yes I would like to return the position of the first occurence of the most common value — VR28, Mar 17 '23 at 01:54
Could there be tied values? If so, what result would you expect then? — Ritchie Sacramento, Mar 17 '23 at 01:55

Darren Tsai · Answer 1 · 2023-03-17T02:24:08.083

A dplyr solution with add_count + slice_max:

library(dplyr)

test_data %>%
  tidyr::pivot_longer(starts_with('week')) %>%
  add_count(Participant, value) %>%
  slice_max(n, by = Participant, with_ties = FALSE)

# # A tibble: 6 × 4
#   Participant name   value     n
#   <chr>       <chr>  <dbl> <int>
# 1 Lion        week_4     1     5
# 2 Cat         week_0     0     9
# 3 Dog         week_1     4     7
# 4 Snake       week_0     0     9
# 5 Tiger       week_2     8     4
# 6 Mouse       week_3     3     5

If there are "ties" and you want to include all ties in the output:

test_data %>%
  tidyr::pivot_longer(starts_with('week')) %>%
  add_count(Participant, value) %>%
  slice_max(n, by = c(Participant, value), with_ties = FALSE) %>%
  slice_max(n, by = Participant)

score 2 · Answer 2 · answered Mar 17 '23 at 05:00

2

Try with fmode from collapse

library(collapse)
names(test_data)[-1][max.col(test_data[-1] == dapply(test_data[-1], 
    MARGIN = 1, fmode), "first")]

-output

[1] "week_4" "week_0" "week_1" "week_0" "week_2" "week_3"

answered Mar 17 '23 at 05:00

akrun

874,273
37
540
662

score 1 · Accepted Answer · answered Mar 17 '23 at 01:55

Your code is a good first step. You can use the result to match() its first position in the row, then use this position to index into the column names:

apply(test_data[, -1], 1, function(x) {
  val <- names(which.max(table(x)))
  names(test_data)[-1][[match(val, x)]]
})
# "week_4" "week_0" "week_1" "week_0" "week_2" "week_3"

Note I use test_data[, -1] to exclude the Participant column; otherwise, the code would return the participant name if there’s no value that occurs more than once, which presumably isn’t what you want.

score 0 · Answer 4 · answered Mar 17 '23 at 03:44

First define a function to find the mode of a vector:

Mode <- \(x) names(sort(-table(x)))[1]

Reference

The hard part is done. Now use dplyr's rowwise() and c_across():

library(dplyr)

test_data %>%
  rowwise() %>%
  mutate(
    m = {
      x <- c_across(week_0:week_10) # get row as a vector
      names(x) <- names(test_data)[-1]
      index <- which(x == Mode(x))[1] # first occurence of mode in 'x'
      names(x)[index]
    }
  )

Get column name for first occurence of most frequent value in a row

4 Answers4