0

I have a large dataset that essentially looks like so:

ex <- data.frame(member = c(rep(1234,5)),
                 caseid = c(72,74,78,72,78),
                 code = c(2270,2508,2270,2134,2984),
                 pay = c(90,120,40,60,200))

I would like to find a way to replace the value in code with the code value corresponding to the row with the highest pay, while being grouped by the caseid column. The output that I would like would be like this:

ex1 <- data.frame(member = c(rep(1234,5)),
                  caseid = c(72,74,78,72,78),
                  code = c(2270,2508,2984,2270,2984),
                  pay = c(90,120,40,60,200))

I am trying to update the code column based on the pay column. So since row 1 has the highest pay of 90, replace row 4 with code 2270

Is there a way I could do this using data.table or tidyverse? The actual dataset has multiple members with different caseids so I would like something that could also group by member id to apply these changes by each member as well. Thanks in advance!

vdu12345
  • 73
  • 1
  • 8
  • 1
    `ex %>% group_by(caseid) %>% mutate(code = max(code))` – Ronak Shah Dec 21 '20 at 02:23
  • @RonakShah mutate(code = max(code)) will replace the code column with the highest code value, but how can I do it so it's based on the highest pay? For example, if row 1 with code 2006 had a pay of 90, how could I replace row 4 with 2006 since the pay in that row is 60? – vdu12345 Dec 21 '20 at 02:30

2 Answers2

2

You can arrange your data based on decreasing pay value and assign the first code value for each group.

library(dplyr)

ex %>%
  arrange(caseid, desc(pay)) %>%
  group_by(member, caseid) %>%
  mutate(code = first(code))

#  member caseid  code   pay
#   <dbl>  <dbl> <dbl> <dbl>
#1   1234     72  2270    90
#2   1234     72  2270    60
#3   1234     74  2508   120
#4   1234     78  2984   200
#5   1234     78  2984    40
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

You can update the code value in-place as below:

library(data.table)

setDT(ex)
ex[, code:=code[which.max(pay)], by=caseid][]
#>    member caseid code pay
#> 1:   1234     72 2270  90
#> 2:   1234     74 2508 120
#> 3:   1234     78 2984  40
#> 4:   1234     72 2270  60
#> 5:   1234     78 2984 200

Created on 2020-12-21 by the reprex package (v0.3.0)

zengc
  • 97
  • 9