1

How can we create columns with highest values for each row ?

References:

https://rdrr.io/cran/dplyr/man/top_n.html

Selecting top N values within a group in a column using R

For e.g.

library(tidyverse)

iris %>% glimpse()

# my attempt
x = iris %>% 
  select(-Species) %>%
  gather(measure,values) %>%
# hereafter got stuck
  mutate(top_1 =
                  top_2 = 
                  top3_3 = )

# expected_output contains same number of rows as input
expected_output = iris %>% mutate(top_1 = 1st highest value from the row  (row wise),
                                  top_2 = 2nd highest value from the row  (row wise),
                                  top_3 = 3rd highest value from the row (row wise))



# expected output first 3 rows looks like below:
iris[1:3,] %>% 
mutate(top_1 = c(5.1,4.9,4.7), top_2 = c(3.5,3.0,3.2), top_3 = c(1.4,1.4,1.3))
Abhishek
  • 407
  • 3
  • 18

1 Answers1

3

We can use apply row-wise, sort the vector in decreasing order and get top 3 values using head

df <- iris 
df[paste0("top_", 1:3)] <- t(apply(df[-5], 1, function(x) 
                             head(sort(x, decreasing = TRUE), 3)))

head(df)
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species top_1 top_2 top_3
#1          5.1         3.5          1.4         0.2  setosa   5.1   3.5   1.4
#2          4.9         3.0          1.4         0.2  setosa   4.9   3.0   1.4
#3          4.7         3.2          1.3         0.2  setosa   4.7   3.2   1.3
#4          4.6         3.1          1.5         0.2  setosa   4.6   3.1   1.5
#5          5.0         3.6          1.4         0.2  setosa   5.0   3.6   1.4
#6          5.4         3.9          1.7         0.4  setosa   5.4   3.9   1.7

A tidyverse alternative which involves some reshaping

library(dplyr)
library(tidyr)

iris %>%
  mutate(row = row_number()) %>%
  select(-Species) %>%
  gather(key, value, -row) %>%
  group_by(row) %>%
  top_n(3, value) %>%
  mutate(key = paste0("top", 1:3)) %>%
  spread(key, value) %>%
  ungroup %>%
  select(-row)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • tidyverse approach gives only new columns – Abhishek Sep 04 '19 at 13:50
  • @Abhi You can add `%>% bind_cols(iris)` to it. – Ronak Shah Sep 04 '19 at 13:57
  • Can we also display the column which contributed to the highest value ? expected_output (3 additional columns stating which column contributed to this result) iris[1:3,] %>% mutate(top_1 = c(5.1,4.9,4.7), top_2 = c(3.5,3.0,3.2), top_3 = c(1.4,1.4,1.3), top_1_from = c("Sepal.Length","Sepal.Length","Sepal.Length"), top_2_from = c("Sepal.Width","Sepal.Width","Sepal.Width"), top_3_from = c("Petal.Length","Petal.Length","Petal.Length")) – Abhishek Sep 04 '19 at 14:08
  • 1
    @Abhi For that we can use `order` and subset names `df[paste0("top_", 1:3)] <- t(apply(df[-5], 1, function(x) names(df)[head(order(x, decreasing = TRUE), 3)]))` – Ronak Shah Sep 04 '19 at 14:42
  • I have accepted your answer. Actually when I tried your approach to the actual data, following error occurs Error in `[<-.data.frame`(`*tmp*`, paste0("top_", 1:3), value = list(c(37.2727272727273, : replacement element 11 has 2 rows, need 879 – Abhishek Sep 04 '19 at 15:08
  • @Abhi There could be various reason to it, difficult to guess without actual data but one thing could be you have `NA`'s in data and when you apply `sort` on vector with `NA`'s it removes `NA` see, `sort(c(1, 4, NA, 3, 4))` hence, resulting in columns with less than 3 values. – Ronak Shah Sep 04 '19 at 15:21
  • thanks for your time, Yep, I agree with you. Actually, there were only 1 value in that row. So, top_2 and top_3 are inevitably NA – Abhishek Sep 04 '19 at 15:51