0

I am trying to find a formula that allows me to find the lowest value for each variable (score...) in a group in order to apply this formula:

relative_socore1= [ team score1 - (worst performing team score 1 (overall)] / [ (Max score 1 per group of teams (e.g. just for group "A")) - worst performing team score1 (overall)]

My data frame has the following structure, with multiple scores....1, 2, 3 and also multiple groups.

data <- data.frame(team= c("blue", "green", "red", "pink", "grey", "black", "rose", "darkblue", "golden", "silver") , group = c("A","A", "B", "C", "D", "D",  "D", "F", "F", "F" ), score1 = c(18, 22, 21, 22, 45, 18, 22, 21, 22, 45), score2= c(10, 20, 21, 92, 40, 18, 20, 21, 20, 45), score3 = c(10, 20, 30, 40, 50, 60, 70, 80, 95, 95)) 

I need to apply this formula to many "scores" in my data frame, changing the "groups" too so that is why I am trying to create a formula instead of calculating it individually.

I know how to get the worst-performing team score 1 (overall),worst_two <- function(x) { min(x, na.rm = T) } but I am struggling a lot to get the value of the best-performing to every group. This part of the formula [ (Max score 1 per group of teams (e.g. just for group "A"))

So far I have managed to make this

test <- function(y, score, data) {
  max.score <- max(data[data$group == y, ]$score, na.rm=T)
  max.score
}

It works when I specify the $score by the specific name (eg. score1), but it doesn't when I replace the value for an "x", "z" or any other value that would help me to create the function.

#not working for "score1"

test <- function(y, z, data) {
  max.score <- max(data[data$group == y, ]$z, na.rm=T)
  max.score
}

max= test( y= "A", z="score1", data = data)

Any help or suggestion would be appreciated!!

EvaMaS
  • 11
  • 5
  • Do you want to get this base R solution working, or would you be open to a `dplyr` solution? – Gregor Thomas Oct 27 '20 at 16:48
  • Your main problem is that you can't use `$` with variables - it does exact matches only. When `z` is a variable, `max(data[data$group == y, ]$z` will not work, but `max(data[data$group == y, z]` will. [See this FAQ for explanation](https://stackoverflow.com/q/18222286/903061). With that, you can probably get things to work... – Gregor Thomas Oct 27 '20 at 16:51
  • However there are much easier ways to got about this. For a start, you could look at the FAQ on [calculating mean by group](https://stackoverflow.com/q/11562656/903061). If you use `sum` instead of `mean`, these answers will do just what you want for that step. Using `dplyr` or `data.table` you could do your entire calculation by group quite easily. – Gregor Thomas Oct 27 '20 at 16:53
  • @GregorThomas sure, a dplyr could also work! – EvaMaS Oct 27 '20 at 16:58

2 Answers2

0

Try this indexing style. Great advice from @GregorThomas. Well, you are using $ to invoke the variable. That is where changing the name is producing issues. Just make this slight change by placing [[z]] instead of $z. That will call the required variable. Here the code:

#Function
test <- function(y, z, data) {
  max.score <- max(data[data$group == y, ][[z]], na.rm=T)
  max.score
}
#Apply
max= test( y= "A", z="score1", data = data)

Output:

max
[1] 22
Duck
  • 39,058
  • 13
  • 42
  • 84
  • Thanks a lot, and also to @GregorThomas I wasn't aware of the restrictions to loop over $. I appreciate this – EvaMaS Oct 27 '20 at 17:05
  • @EvelynaS Yeah, it is about the way you use indexing. Also Thomas gave you a possible way to solve the issue. I hope that helped! – Duck Oct 27 '20 at 17:07
0

Based on my best understanding of your formula, here's a dplyr/tidyr approach: pivot the data to long format, do the grouped calculations, pivot back to wide:

library(dplyr)
library(tidyr)
data %>%
  pivot_longer(cols = starts_with("score"), names_to = "score_number", values_to = "score") %>%
  group_by(score_number) %>%
  mutate(min_overall = min(score)) %>%
  group_by(group, score_number) %>%
  mutate(rel = (score - min_overall) / (max(score) - min_overall)) %>%
  pivot_wider(id_cols = c(team, group), names_from = score_number, values_from = c(score, rel))

# # A tibble: 10 x 8
# # Groups:   group [5]
#    team     group score_score1 score_score2 score_score3 rel_score1 rel_score2 rel_score3
#    <chr>    <chr>        <dbl>        <dbl>        <dbl>      <dbl>      <dbl>      <dbl>
#  1 blue     A               18           10           10      0          0          0    
#  2 green    A               22           20           20      1          1          1    
#  3 red      B               21           21           30      1          1          1    
#  4 pink     C               22           92           40      1          1          1    
#  5 grey     D               45           40           50      1          1          0.667
#  6 black    D               18           18           60      0          0.267      0.833
#  7 rose     D               22           20           70      0.148      0.333      1    
#  8 darkblue F               21           21           80      0.111      0.314      0.824
#  9 golden   F               22           20           95      0.148      0.286      1    
# 10 silver   F               45           45           95      1          1          1    
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294