0

I am trying to find first max, second max, and third max value and corresponding col names for each row, but unable to do that in r. Please help.

Here is how the dataframe looks like:

              X1    X2    X3   X4    X5   X6   X7    X8    X9    X10   X11  X12   
      10003   0.0   0.0   0.0  0.0   0.0  0.0  0.0   0.0   0.0   0.0   0.0  0.0       
      10006   0.0   0.0   0.0  0.0   0.0  0.0 16.7   0.0   0.0   0.0   0.0  0.0       
      10007   0.0   0.0   0.0  0.0   0.0  0.0  0.0   0.0   0.0   0.0   0.0  0.0       
      10008   0.0   0.0   0.0  0.0   0.0  0.0  0.0   0.0   0.0   0.0   0.0  0.0       
      10010   0.0   0.0   0.0  0.0   0.0  0.0  0.0   0.0   0.0   0.0   0.0  0.0       
      10014   0.0   0.0   0.0  0.0   0.0  0.0  0.0   0.0   0.0   0.0   0.0  0.0   
Taher A. Ghaleb
  • 5,120
  • 5
  • 31
  • 44
  • Your data seems to be all zeros except for one value. – G5W Jan 02 '19 at 01:24
  • 2
    Welcome to Stack Overflow! Could you make your problem reproducible by sharing a sample of your data so others can help (please do not use `str()`, `head()` or screenshot)? You can use the [`reprex`](https://reprex.tidyverse.org/articles/articles/magic-reprex.html) and [`datapasta`](https://cran.r-project.org/web/packages/datapasta/vignettes/how-to-datapasta.html) packages to assist you with that. See also [Help me Help you](https://speakerdeck.com/jennybc/reprex-help-me-help-you?slide=5) & [How to make a great R reproducible example?](https://stackoverflow.com/q/5963269) – Tung Jan 02 '19 at 01:32
  • It is a massive data frame, I just posted first few rows and fewer columns. – user10611398 Jan 02 '19 at 01:36
  • Your dataset is not reproducible and your question is vague. – www Jan 02 '19 at 01:37
  • and10003,10006 etc are rows names, not values .I tried to run with apply function across rows but it does not give me the colnames – user10611398 Jan 02 '19 at 01:38
  • 1
    The question's title says "max" and the text says "min"! – Taher A. Ghaleb Jan 02 '19 at 01:39
  • Tried this, but has error – user10611398 Jan 02 '19 at 01:39
  • Outdata<-setNames( data.frame( t(apply(mat1_2_df, 1, function(x) { ind <- which.max(x)[1] c(ind, x[ind]) })), t(apply(mat1_2_df, 1, function(x) { ind2<-which.max(x[-ind])[1] c(ind2, x[ind2])})), t(apply(mat1_2_df, 1, function(x) { ind3<-which.max(x[-c(ind,ind2)]) c(ind3, x[ind3])})) ), c( "First","Value","Second","value2","Third","value3")) – user10611398 Jan 02 '19 at 01:39
  • Sorry it is max values – user10611398 Jan 02 '19 at 01:42
  • Have you looked at? https://stackoverflow.com/questions/2453326/fastest-way-to-find-second-third-highest-lowest-value-in-vector-or-column – discipulus Jan 02 '19 at 01:45
  • yes, I looked at this, but unable to replicate on the dataframe. I will add some more rows – user10611398 Jan 02 '19 at 01:47
  • it does not allow me to add more rows, I am trying to get the max, second max and third max and corresponding column names and in case these are zeros then still get at least one corresponding colname against 0.0 values. – user10611398 Jan 02 '19 at 01:51
  • I have made up this dataframe to illustrate it better. – user10611398 Jan 02 '19 at 02:01
  • x1 x2 x3 x4 x5 x6 x7 x8 x9 1003a 0 45.7 0 22.9 0 13.7 0 0 23.1 1004a 22.2 0 13.2 0 5.4 0 9.7 0 0 1005a 0 0 0 12 2.1 0 0 3.2 0 1006a 1.2 0 1.2 0 43.9 43.9 0 0 57.6 – user10611398 Jan 02 '19 at 02:01

3 Answers3

1

This is the sample data you posted in your comment:

data <-read.table(text="       x1    x2    x3     x4    x5    x6   x7   x8    x9
                        1003    0  45.7     0   22.9     0  13.7    0    0  23.1 
                        1004 22.2     0  13.2      0   5.4     0  9.7    0     0 
                        1005    0     0     0     12   2.1     0    0  3.2     0  
                        1006  1.2     0   1.2      0  43.9  43.9    0    0  57.6",
                    header=T)

You can use dplyr and tidyverse to acheive this.


The following code will give you the maximum three columns across all the rows:

library(dplyr)
library(tidyverse)

data %>% 
  rownames_to_column() %>%
  gather(column, value, -rowname) %>%
  group_by(rowname) %>% 
  arrange(desc(value)) %>% 
  head(3) 

This will give you the following result:

# A tibble: 3 x 3
# Groups:   rowname [3]
#   rowname column value
#   <chr>   <chr>  <dbl>
# 1 1006    x9      57.6
# 2 1003    x2      45.7
# 3 1006    x5      43.9

If you want to get the maximum three values for each row, you can do it as follows:

result <- data %>% 
  rownames_to_column() %>%
  gather(column, value, -rowname) %>%
  group_by(rowname) %>% 
  mutate(max = rank(-value)) %>%
  filter(max <= 3) %>% 
  arrange(rowname, max)

Which will give you the following result:

# A tibble: 12 x 4
# Groups:   rowname [4]
#    rowname column value   max
#    <chr>   <chr>  <dbl> <dbl>
#  1 1003    x2      45.7   1  
#  2 1003    x9      23.1   2  
#  3 1003    x4      22.9   3  
#  4 1004    x1      22.2   1  
#  5 1004    x3      13.2   2  
#  6 1004    x7       9.7   3  
#  7 1005    x4      12     1  
#  8 1005    x8       3.2   2  
#  9 1005    x5       2.1   3  
# 10 1006    x9      57.6   1  
# 11 1006    x5      43.9   2.5
# 12 1006    x6      43.9   2.5

To summarize the result for each row, use the following code:

result %>% 
  mutate(result = paste0(column, "=", value, collapse = ", ")) %>% 
  select(result) %>% 
  distinct()

Which will give you the following result:

# A tibble: 4 x 2
# Groups:   rowname [4]
#   rowname result                   
#   <chr>   <chr>                    
# 1 1003    x2=45.7, x9=23.1, x4=22.9
# 2 1004    x1=22.2, x3=13.2, x7=9.7 
# 3 1005    x4=12, x8=3.2, x5=2.1    
# 4 1006    x9=57.6, x5=43.9, x6=43.9


Hope it helps.

Taher A. Ghaleb
  • 5,120
  • 5
  • 31
  • 44
  • Thanks for your answer. But is there any simple way to do this as I am new and unable to understand the above code and I am looking to get six columns against each row of dataframe, three giving max,second max,third max and also with their collnames(e.g. 45.7,x2,23.1,x9,22.9,x4) for the first row. – user10611398 Jan 02 '19 at 02:28
  • I see what you mean. Please have a look at my updated answer. – Taher A. Ghaleb Jan 02 '19 at 03:02
0

You can use

max.names = apply(data, 1, function(x) names(sort(x, decreasing = T)[1:3]))
max.vals = apply(data, 1, function(x) sort(x, decreasing = T)[1:3])
data = cbind(data, t(max.names), t(max.vals))
#        x1   x2   x3   x4   x5   x6  x7  x8   x9  1  2  3    1    2    3
# 1003  0.0 45.7  0.0 22.9  0.0 13.7 0.0 0.0 23.1 x2 x9 x4 45.7 23.1 22.9
# 1004 22.2  0.0 13.2  0.0  5.4  0.0 9.7 0.0  0.0 x1 x3 x7 22.2 13.2  9.7
# 1005  0.0  0.0  0.0 12.0  2.1  0.0 0.0 3.2  0.0 x4 x8 x5 12.0  3.2  2.1
# 1006  1.2  0.0  1.2  0.0 43.9 43.9 0.0 0.0 57.6 x9 x5 x6 57.6 43.9 43.9
dww
  • 30,425
  • 5
  • 68
  • 111
-1

Here is my approach:

 # Make up data because yours is pretty unreproducible:
 df <- data.frame(X1=1:5, X2=c(3,5,1,6,7))

 # combine and sort the data by decreasing value:
 a <- sort(dplyr::combine(df), decreasing = T)[1:3]

 # For loop to get the indexes:
 for(i in 1:length(a)){
    print(which(df==a[i], arr.ind = T))
 }

This will give you what you need. Replace print with whatever you want to do (eg assign or whatever you need)

morgan121
  • 2,213
  • 1
  • 15
  • 33