2

I am trying to find the do a function which is similar to a vlookup in excel but which returns the maximum value and the other values in the same row. The data frame looks like this:

enter image description here enter image description here

The data frame which I am dealing with are given below:

dput(Book3)
structure(list(Item = c("ABA", "ABB", "ABC", "ABD", "ABE", "ABF"
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-6L))

dput(Book4)
structure(list(Item = c("ABA", "ABB", "ABC", "ABD", "ABE", "ABF", 
"ABA", "ABB", "ABC", "ABD", "ABE", "ABF", "ABA", "ABB", "ABC", 
"ABD", "ABE", "ABF"), Max1 = c(12, 68, 27, 17, 74, 76, 78, 93, 
94, 98, 46, 90, 5, 58, 67, 64, 34, 97), Additional1 = c(40, 66, 
100, 33, 66, 19, 8, 70, 21, 93, 48, 34, 44, 89, 74, 20, 0, 47
), Additional2 = c(39, 31, 85, 58, 0, 2, 57, 28, 31, 32, 15, 
22, 93, 41, 57, 81, 95, 46)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -18L))

The Expected output for this is given below:

enter image description here

user20203146
  • 447
  • 7

4 Answers4

4

You are looking for slice_max:

library(dplyr)
Book4 %>% 
  group_by(Item) %>% 
  slice_max(Max1)

#   Item   Max1 Additional1 Additional2
# 1 ABA      78           8          57
# 2 ABB      93          70          28
# 3 ABC      94          21          31
# 4 ABD      98          93          32
# 5 ABE      74          66           0
# 6 ABF      97          47          46
Maël
  • 45,206
  • 3
  • 29
  • 67
  • I worked with this data set, but I tried I performed this on a dataset of 16410 rows which has unique value of 601 but while running this code, I was expecting that slice_max would take the maximum value of these 601 unique items based on the column specified inside slice_max, but it did not do that. Instead it gave 8223 rows and it had duplicates? Do you know why? – user20203146 Nov 03 '22 at 18:25
  • This is surprising. Maybe edit your question with a sample of that data that reproduces the problem – Maël Nov 03 '22 at 18:27
  • 2
    @Maël You need `with_ties = FALSE` (by default it is TRUE) i.e. suppose you have a group with only single value and 40 is the group size, then with_ties = TRUE, will get all the rows for that group because it is the single max value – akrun Nov 03 '22 at 18:31
  • @akrun You are spot on. I tried it with with_ties=FALSE and it gave 601 unique values and was spot on.. Thank you so much for the assist. (akrun & Mael) – user20203146 Nov 03 '22 at 18:41
  • @akrun I was wondering to add to this, Suppose we have multiple items with the same Max value, Can we select a select category where we can check for another aspect. Like Minimum of another value in a different column named (Difference). So in case its a tie while selecting slice_max(Max1) we can check the tie breaking item with slice_min(Difference). Is it possible or is that too much? – user20203146 Dec 08 '22 at 09:07
  • @Maël A additional question on top of your solution. Could you please check my comment to akrun? – user20203146 Dec 08 '22 at 09:07
  • @user20203146 you may be able to do this in `slice` or `filter` instead of `slice_max` – akrun Dec 08 '22 at 16:12
2

Using R base aggregate + max + merge

   > merge(Book4, aggregate(Max1~Item, data = Book4, max), by = c("Item", "Max1"))
  Item Max1 Additional1 Additional2
1  ABA   78           8          57
2  ABB   93          70          28
3  ABC   94          21          31
4  ABD   98          93          32
5  ABE   74          66           0
6  ABF   97          47          46
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
  • This is not matching the OP's desired output, as it is finding the max within each column (by group), not the row with the max `Max1` values. – r2evans Nov 03 '22 at 16:32
2

Using base R

subset(Book4, Max1 == ave(Max1, Item, FUN = max))

-output

# A tibble: 6 × 4
  Item   Max1 Additional1 Additional2
  <chr> <dbl>       <dbl>       <dbl>
1 ABE      74          66           0
2 ABA      78           8          57
3 ABB      93          70          28
4 ABC      94          21          31
5 ABD      98          93          32
6 ABF      97          47          46
akrun
  • 874,273
  • 37
  • 540
  • 662
2

An alternative base solution that is more resilient to floating-point precision problems (c.f., Why are these numbers not equal?, https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f). It also allows two behavior options if there are duplicate max-values:

  1. if you want all of them, use ties.method = "min";
  2. if you want the first (or just one) of them, then ties.method = "first".
Book4[ave(Book4$Max1, Book4$Item, FUN = function(z) rank(-z, ties.method = "first")) == 1,]
# # A tibble: 6 x 4
#   Item   Max1 Additional1 Additional2
#   <chr> <dbl>       <dbl>       <dbl>
# 1 ABE      74          66           0
# 2 ABA      78           8          57
# 3 ABB      93          70          28
# 4 ABC      94          21          31
# 5 ABD      98          93          32
# 6 ABF      97          47          46
r2evans
  • 141,215
  • 6
  • 77
  • 149