1

I have a table that looks like so:

PARTY_ID | PARTYNUM | WEIGHTED_CONF | CONF_SCORE
1           ABC       HIGH            3
1           ABC       HIGH            3
1           ABC       MEDIUM          2
2           DEF       LOW             1
2           DEF       MEDIUM          2
2           DEF       HIGH            3
3           GHI       PERFECT         4
3           GHI       HIGH            3
3           GHI       HIGH            3

I would like to create a new field that takes the highest 'CONF_SCORE' by each 'PARTYNUM' group.

Desired output

PARTY_ID | PARTYNUM | WEIGHTED_CONF | CONF_SCORE | MAX
1           ABC       HIGH            3            3
1           ABC       HIGH            3            3
1           ABC       MEDIUM          2            3
2           DEF       LOW             1            3
2           DEF       MEDIUM          2            3
2           DEF       HIGH            3            3
3           GHI       PERFECT         4            4
3           GHI       HIGH            3            4
3           GHI       HIGH            3            4

I tried this but my output returns '-inf'

new_dataset_final <- new_dataset1 %>%
group_by(PARTYNUM) %>%
  mutate(MAX = max(as.numeric(new_dataset$Conf_Score)))
Dinho
  • 704
  • 4
  • 15
  • 2
    "Almost never" should you use `new_dataset$` *inside* of a dplyr-verb: when you do that, you ignore the `group_by` grouping completely. Remove that and it returns what you want. – r2evans Sep 22 '21 at 16:32
  • 1
    Also, btw, `Conf_Score` and `CONF_SCORE` are not the same. `max(as.numeric(NAME_OF_NONEXISTENT_COLUMN))` is the same as `max(as.numeric(NULL))` which usually does two things: ***it warns you*** with "no non-missing arguments", and it returns `-Inf`. In general, do not ignore warnings, they likely are telling you at least where (if not how) you are munging your data. Either that, or it fails with `object 'Conf_Score' not found`. Either way, they're different. – r2evans Sep 22 '21 at 16:36
  • Thank you - appreciate the clear and helpful standards I should be using. – Dinho Sep 22 '21 at 16:38
  • 1
    FYI, a good discussion of summarizing by group: https://stackoverflow.com/q/11562656/3358272. It isn't the "gospel", so to speak, but it has a lot of good examples, including dplyr-based code, some good examples to imitate as you become more comfortable with dplyr in general and grouping ops specifically. – r2evans Sep 22 '21 at 16:40

2 Answers2

1

As r2evans mentions, you're requesting the max of the ungrouped data frame by using the $ notation and specifying new_dataset a second time. This should work:

new_dataset_final <- new_dataset1 %>%
group_by(PARTYNUM) %>%
  mutate(MAX = max(as.numeric(CONF_SCORE)))
Dubukay
  • 1,764
  • 1
  • 8
  • 13
1

In base R we can do

aggregate(CONF_SCORE  ~PARTYNUM, 
        data = new_dataset1, max)

Or to add as a new column, use ave

new_dataset1$MAX <- with(new_dataset1, ave(CONF_SCORE, PARTYNUM, FUN = max))
akrun
  • 874,273
  • 37
  • 540
  • 662