0

I have a dataframe punkt_tabelle containing points made in a game. Each game has 2 or 3 sets (3 in the MRE). The dataframe contains how the points were made. I also have the scores at the end of the set, which is stored in scores. I calculate the sum for each team in each set. (I did this in total_pts).

What I am trying to achieve is to compare the sum of the points from the datatable (per team and per set) with the points this team made according to the scores. If scores in this set is bigger than the sum calculated as total then I want to add an extra row to the datatable. This new row should contain the teamname, the set and the skill for this new row should be "Opp. Other Errors" and the value for Pkt would be the difference between scores and total. It maybe (and is the case in the MRE), that a new row has to be added for every team and every set.

If you would re-run the total_pts calculation after the new rows were added, it would therefore equal the results in scores.

I tried variations of the code below according to these questions and articles (R Conditional evaluation when using the pipe operator %>%, Inserting a new row to data frame for each group id), but could not work out a solution for my problem.

Here is the last version of my code:

library(dplyr)
library (devtools)

punkt_tabelle <- structure(list(Team = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 
                 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
                 1L), .Label = c("Miller/Myer", "Winter/Summer"), class = "factor"), 
                 Skill = structure(c(1L, 1L, 3L, 2L, 2L, 2L, 1L, 1L, 3L, 2L, 
                 2L, 2L, 4L, 4L, 5L, 6L, 6L, 6L, 4L, 4L, 5L, 6L, 6L, 6L), .Label = c("Attack", 
                 "Service", "Block", "Opp. Attack Error", "Opp. Block Error", 
                 "Opp. Serve Error"), class = "factor"), Set = c(2L, 3L, 2L, 
                 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 2L, 3L, 2L, 1L, 2L, 3L, 
                 1L, 2L, 3L, 1L, 2L, 3L), Pkt = c(2L, 1L, 1L, 0L, 0L, 0L, 
                 3L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 
                 0L, 1L, 0L)), row.names = c(NA, -24L), vars = c("Team", "Skill"
                 ), indices = list(0:1, 2L, 18:19, 20L, 21:23, 3:5, 6:7, 8L, 12:13, 
                 14L, 15:17, 9:11), group_sizes = c(2L, 1L, 2L, 1L, 3L, 3L, 
                 2L, 1L, 2L, 1L, 3L, 3L), biggest_group_size = 3L, labels = structure(list(
                 Team = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
                 2L, 2L), .Label = c("Miller/Myer", "Winter/Summer"), class = "factor"), 
                 Skill = c("Attack", "Block", "Opp. Attack Error", "Opp. Block Error", 
                 "Opp. Serve Error", "Service", "Attack", "Block", "Opp. Attack Error", 
                 "Opp. Block Error", "Opp. Serve Error", "Service")), row.names = c(NA, 
                 -12L), class = "data.frame", vars = c("Team", "Skill")), class = c("grouped_df", 
                  "tbl_df", "tbl", "data.frame"))



score_miller_myer <- c(3,6,3) #total points in sets 1, 2, 3
score_winter_summer <- c(5,4,5)
scores <- c(score_miller_myer, score_winter_summer)

#calculate the sum of the points per team and per set
total_pts <- punkt_tabelle %>% group_by(Team, Set) %>% summarize(total = sum(Pkt))
total_pts

#try to compare with the score and add en entry in the dataframe
punkt_tabelle %>% 
  group_by (Team, Set) %>% 
  mutate(total = sum(Pkt)) %>% 
  {if (total<scores) dplyr::bind_rows(Team=Team, Set=Set, Skill="Opp. Other Error", Pkt=(total-scores))}

punkt_tabelle

Can this somehow be done like this? Or do I need to use a loop and do that manually for each set and team? Please help!

EDIT: The expected output in this example would look like this:

Team          Skill               Set   Pkt
<fct>         <fct>             <int> <int>
 1 Miller/Myer   Attack                2     2
 2 Miller/Myer   Attack                3     1
 3 Miller/Myer   Block                 2     1
 4 Miller/Myer   Service               1     0
 5 Miller/Myer   Service               2     0
 6 Miller/Myer   Service               3     0
 7 Winter/Summer Attack                1     3
 8 Winter/Summer Attack                2     1
 9 Winter/Summer Block                 3     1
10 Winter/Summer Service               1     0
11 Winter/Summer Service               2     1
12 Winter/Summer Service               3     1
13 Winter/Summer Opp. Attack Error     2     0
14 Winter/Summer Opp. Attack Error     3     0
15 Winter/Summer Opp. Block Error      2     0
16 Winter/Summer Opp. Serve Error      1     0
17 Winter/Summer Opp. Serve Error      2     1
18 Winter/Summer Opp. Serve Error      3     1
19 Miller/Myer   Opp. Attack Error     1     1
20 Miller/Myer   Opp. Attack Error     2     0
21 Miller/Myer   Opp. Block Error      3     0
22 Miller/Myer   Opp. Serve Error      1     0
23 Miller/Myer   Opp. Serve Error      2     1
24 Miller/Myer   Opp. Serve Error      3     0
25 Winter/Summer Opp. Other Error      1     2  #here start the added rows
26 Winter/Summer Opp. Other Error      2     1  
27 Winter/Summer Opp. Other Error      3     2
28 Miller/Myer   Opp. Other Error      1     2
29 Miller/Myer   Opp. Other Error      2     2
30 Miller/Myer   Opp. Other Error      3     2

Further explanation of the problem: A team scores in various ways. Either they score themselves (Attack, Serve, Block) or their opponent makes a mistake (Opp. Attack Error, Opp. Serve Error, Opp. Block Error). Still that leaves some difference to the total score they reach, because there are some errors of the opponents which are not specified. For this I want to add a row "Opp. Other errors" after calculating the difference.

Example: in row 26: the value of Pkt is 1 because in total_pts in Set 2 team Winter/Summer has 3 points. But their score according to score_winter_summerin set 2 is 4 points. So there is a difference of 1 point which is added in the new row.

empi75
  • 77
  • 1
  • 7
  • I am not sure I understand what you're trying to do. Could you please edit your post and include the expected output for the sample data you give? – Maurits Evers Dec 05 '18 at 23:32
  • @MauritsEvers: I edited the question and hope it is clearer now, I also added the expected output. – empi75 Dec 05 '18 at 23:44

1 Answers1

0

Here is a possibility.

  1. First, we need to store scores in a data.frame that includes information about the Team and Set

    df.scores <- data.frame(
        Team = c(rep("Miller/Myer", 3), rep("Winter/Summer", 3)),
        Set = 1:3,
        scores = scores)
    

    Let's inspect df.scores

    df.scores
    #           Team Set scores
    #1   Miller/Myer   1      3
    #2   Miller/Myer   2      6
    #3   Miller/Myer   3      3
    #4 Winter/Summer   1      5
    #5 Winter/Summer   2      4
    #6 Winter/Summer   3      5
    
  2. Next, we do a left join of punk_tabelle with df.scores by Team and Set, calculate the total points Total = sum(Pkt) by Team and Set; the Opp. Other Error is then given by the difference between scores and Total. The final expected output is achieved by doing a long to wide to long conversion.

    punkt_tabelle %>%
        left_join(df.scores) %>%
        group_by(Team, Set) %>%
        mutate(
            Total = sum(Pkt),
            `Opp. Other Error` = scores - Total) %>%
        spread(Skill, Pkt) %>%
        select(-scores, -Total) %>%
        gather(Skill, Pkt, -Team, -Set)
    #Joining, by = c("Team", "Set")
    ## A tibble: 42 x 4
    ## Groups:   Team, Set [6]
    #   Team            Set Skill              Pkt
    #   <fct>         <int> <chr>            <dbl>
    # 1 Miller/Myer       1 Opp. Other Error     2
    # 2 Miller/Myer       2 Opp. Other Error     2
    # 3 Miller/Myer       3 Opp. Other Error     2
    # 4 Winter/Summer     1 Opp. Other Error     2
    # 5 Winter/Summer     2 Opp. Other Error     1
    # 6 Winter/Summer     3 Opp. Other Error     2
    # 7 Miller/Myer       1 Attack              NA
    # 8 Miller/Myer       2 Attack               2
    # 9 Miller/Myer       3 Attack               1
    #10 Winter/Summer     1 Attack               3
    ## ... with 32 more rows
    
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • 1
    Thank you! nice - avoiding the condition. for completeness and possible future readers: `library (tidyr)` needs to be included in the code. – empi75 Dec 06 '18 at 10:04