1

I am trying to get the index of the column that has the highest value among selected columns. When trying with dplyr, my attempts are not giving me the right result.

library(dplyr);library(magrittr)
DF1 <- data.frame(Factor1 = c(1,2,4),Factor2 = c(3,1,1),Factor3 = c(9,1,0)) %>% 
    mutate(max_ind = which.max(c(.$Factor1,.$Factor2,.$Factor3))) %>% print
          Factor1 Factor2 Factor3 max_ind
        1       1       3       9       7
        2       2       1       1       7
        3       4       1       0       7

Where is the mistake? Why is dplyr behaving this way. I should probably use rowwise, but that does not seem to be the best way to go. Any thought of how to do this in base, tidyverse or data.table?

Edit-1 (some other attempt)

With sapply I am getting this:

DF1 <- data.frame(Factor1 = c(1,2,4),Factor2 = c(3,1,1),Factor3 = c(9,1,0)) %>%
+   mutate(max_ind = which.max(c(Factor1,Factor2,Factor3)),
+          max_ind2 = sapply(X = ., function(x) which.max(c(x[Factor1],x[Factor2],x[Factor3])))) %>% print
  Factor1 Factor2 Factor3 max_ind max_ind2
1       1       3       9       7        4
2       2       1       1       7        1
3       4       1       0       7        1

But here I see 4 in the first row while it should be 3.

Edit-2

I am also looking for a solution where we can specify the columns to be used for the comparison (which.max)

Edit-3

All of base, purrr::map and dplyr::mutate examples work.

#R>DF1 <- data.frame(Factor1 = c(1,2,4,1),Factor2 = c(3,1,1,6),Factor3 = c(9,1,0,4)) 
#R>DF1 %>% mutate(max_ind_purrr = pmap(.l = list(Factor1,Factor2,Factor3),~which.max(c(...)))) %>% print()
  Factor1 Factor2 Factor3 max_ind_purrr
1       1       3       9             3
2       2       1       1             1
3       4       1       0             1
4       1       6       4             2
#R>DF1 %>% mutate(max_ind_dplyr=max.col(DF1[,1:3]))
  Factor1 Factor2 Factor3 max_ind_dplyr
1       1       3       9             3
2       2       1       1             1
3       4       1       0             1
4       1       6       4             2
#R>DF1 <- transform(DF1,max_ind_base=apply(DF1[, c('Factor1','Factor2','Factor3')],1,which.max))%>% print
  Factor1 Factor2 Factor3 max_ind_base
1       1       3       9            3
2       2       1       1            1
3       4       1       0            1
4       1       6       4            2
Stat-R
  • 5,040
  • 8
  • 42
  • 68

3 Answers3

4

I think you are asking for row-wise comparisons to find the column index that contains the maximum value for that row. This is why sapply is not working as, by default, it will look down the columns. which.max also deals with vectors - in your case you don't want to return the index within each vector because that refers to the column vector and not the row of the data.frame.

This is basically the difference between the max function and the pmax function. A row-wise version of which.max is max.col so you could specify:

DF1 %>% mutate(max_ind=max.col(DF1))

You can then choose which columns to specify:

# only considering columns 1 and 2
DF1 %>% mutate(max_ind=max.col(DF1[,1:2]))
Chris
  • 3,836
  • 1
  • 16
  • 34
  • Although it is less to write, I strongly recommend to formulate the column names explicitly. Otherwise it could yield a hell of a mess if the order changes, e.g. in a large project. – jay.sf Jul 14 '19 at 11:22
  • Bear in mind that `max.col` always coerces its input to a matrix, so it's not ideal for data frames. – Alexis Jul 14 '19 at 15:41
3

In base R you could do:

DF1 <- transform(DF1, max_ind=apply(DF1, 1, which.max))

However, as wisely pointed out by @DavidArenburg in comments - there's actually the vectorized approach max.col().

DF1 <- transform(DF1, max_ind=max.col(DF1))
#         Factor1 Factor2 Factor3 max_ind
# Factor1       1       3       9       3
# Factor2       2       1       1       1
# Factor3       4       1       0       1

To get the maximum of specified column names, just do this accordingly on a subset.

DF1 <- transform(DF1, max_ind_subset=max.col(DF1[c("Factor1", "Factor2")]))
#   Factor1 Factor2 Factor3 max_ind_subset
# 1       1       3       9              2
# 2       2       1       1              1
# 3       4       1       0              1

Data

DF1 <- structure(list(Factor1 = c(1, 2, 4), Factor2 = c(3, 1, 1), Factor3 = c(9, 
1, 0)), class = "data.frame", row.names = c(NA, -3L))
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • I think OP wanted the max ind of the column and not the row which you are calculating here - it just so happens the result is the same in both cases – Chris Jul 14 '19 at 10:47
  • @Chris Actually I am looking for the index of the column with maximum value for **every row** – Stat-R Jul 14 '19 at 10:50
  • @Stat-R yes, that's what I mean. I think that's what Jack's answer does but not Jay's, do you agree? – Chris Jul 14 '19 at 10:52
  • @Chris I am sorry I cannot but see that the answers are identical in both the cases. – Stat-R Jul 14 '19 at 10:54
  • 1
    @Stat-R, it's because your example is symmetrical. Change the it to factor1=c(1,4,2) for instance and the answers will diverge. I think you are looking for Jack's answer but I may be wrong – Chris Jul 14 '19 at 10:57
  • @Chris I was confused at first, but have noticed it now. Thank you for paying attention, I've edited my answer. – jay.sf Jul 14 '19 at 11:02
  • @jay.sf I was wondering how you do this when you must specify the column names. – Stat-R Jul 14 '19 at 11:05
  • @Chris The column names which you want to include in maximum calculation? – jay.sf Jul 14 '19 at 11:08
  • 1
    @jay.sf Not sure that comment was directed at me? But I agree you're edited answer does the job now +1 :) – Chris Jul 14 '19 at 11:10
  • @Stat-R The column names which you want to include in maximum calculation? – jay.sf Jul 14 '19 at 11:11
  • @jay.sf, I have used your answer and extended it to specify columns. See my edit-3. Thank you. – Stat-R Jul 14 '19 at 11:13
2

Try this using purrr::pmap:

DF1 <-
  data.frame(
    Factor1 = c(1, 2, 4),
    Factor2 = c(3, 1, 1),
    Factor3 = c(9, 1, 0)
  ) %>%
  mutate(max_ind = pmap_int(list(Factor1, Factor2, Factor3), ~which.max(c(...))))

Output:

  Factor1 Factor2 Factor3 max_ind
1       1       3       9       3
2       2       1       1       1
3       4       1       0       1
Jack Brookes
  • 3,720
  • 2
  • 11
  • 22