Create a data frame that defines the hierarchy, and assigns each possibility a numeric score.
hi <- data.frame(Poss = unique(df$Set), Nums =c(105,104))
In this case, A
gets a numerical value of 105
, B
gets a numerical score of 104
(so A
would be preferred over B
in the case of a tie).
Join the hierarchy to the original data frame.
require(dplyr)
matched <- left_join(df, hi, by = c("Set"="Poss"))
Then, add a frequency column to your original data frame that lists the number of times each unique Set
-Value
combination occurs.
setDT(matched)[, freq := .N, by = c("Set", "Value")]
Now that those frequencies have been recorded, we only need row of each Set
-Value
combo, so get rid of the rest.
multiplied <- distinct(matched, Set, Value, .keep_all = TRUE)
Now, multiply frequency by the numeric scores.
multiplied$mult <- multiplied$Nums * multiplied$freq
Lastly, sort by Set
first (ascending), then mult
(descending), and use distinct()
to take the highest numerical score for each Value
within each Set
.
check <- multiplied[with(multiplied, order(Set, -mult)), ]
final <- distinct(check, Set, .keep_all = TRUE)
This works because multiple instances of B (numerical score = 104) will be added together (3 instances would give B a total score in the mult
column of 312) but whenever A and B occur at the same frequency, A will win out (105 > 104, 210 > 208, etc.).
If using different numeric scores than the ones provided here, make sure they are spaced out enough for the dataset at hand. For example, using 2 for A
and 1 for B
doesn't work because it requires 3 instances of B to trump A, instead of only 2. Likewise, if you anticipate large differences in the frequencies of A and B, use 1005 and 1004, since A will eventually catch up to B with the scores I used above (200 * 104 is less than 199 * 205).