Replace column values with group by and condition on another column using R

Question

I have a large dataset that essentially looks like so:

ex <- data.frame(member = c(rep(1234,5)),
                 caseid = c(72,74,78,72,78),
                 code = c(2270,2508,2270,2134,2984),
                 pay = c(90,120,40,60,200))

I would like to find a way to replace the value in code with the code value corresponding to the row with the highest pay, while being grouped by the caseid column. The output that I would like would be like this:

ex1 <- data.frame(member = c(rep(1234,5)),
                  caseid = c(72,74,78,72,78),
                  code = c(2270,2508,2984,2270,2984),
                  pay = c(90,120,40,60,200))

I am trying to update the code column based on the pay column. So since row 1 has the highest pay of 90, replace row 4 with code 2270

Is there a way I could do this using data.table or tidyverse? The actual dataset has multiple members with different caseids so I would like something that could also group by member id to apply these changes by each member as well. Thanks in advance!

@RonakShah mutate(code = max(code)) will replace the code column with the highest code value, but how can I do it so it's based on the highest pay? For example, if row 1 with code 2006 had a pay of 90, how could I replace row 4 with 2006 since the pay in that row is 60? — vdu12345, Dec 21 '20 at 02:30

score 2 · Accepted Answer · answered Dec 21 '20 at 02:33

You can arrange your data based on decreasing pay value and assign the first code value for each group.

library(dplyr)

ex %>%
  arrange(caseid, desc(pay)) %>%
  group_by(member, caseid) %>%
  mutate(code = first(code))

#  member caseid  code   pay
#   <dbl>  <dbl> <dbl> <dbl>
#1   1234     72  2270    90
#2   1234     72  2270    60
#3   1234     74  2508   120
#4   1234     78  2984   200
#5   1234     78  2984    40

score 0 · Answer 2 · answered Dec 21 '20 at 03:10

You can update the code value in-place as below:

library(data.table)

setDT(ex)
ex[, code:=code[which.max(pay)], by=caseid][]
#>    member caseid code pay
#> 1:   1234     72 2270  90
#> 2:   1234     74 2508 120
#> 3:   1234     78 2984  40
#> 4:   1234     72 2270  60
#> 5:   1234     78 2984 200

^{Created on 2020-12-21 by the reprex package (v0.3.0)}

Replace column values with group by and condition on another column using R

2 Answers2