Finding the sum in R based off different variables

Question

I have dataset that looks like this:

match_id batting_team ball over total_runs
1        Team_X       1    1    2
1        Team_X       2    1    0
1        Team_X       3    1    1
1        Team_X       4    1    0
1        Team_X       5    1    2
1        Team_X       6    1    2
1        Team_X       1    2    2
1        Team_X       2    2    0
1        Team_X       3    2    1
1        Team_X       4    2    0
1        Team_X       5    2    2
1        Team_X       6    2    2

The data then goes on to show the runs made in each ball by each team in each over. I would like to add a column that does on to show the number of runs scored in each over for every over by every team by every match. The aim is to go on to plot a linear regression model to show the number of runs in an over compared to which over it is in a match. Does anyone have any advice?

To better understand what you are looking for, could you post the expected results for the sample data you provided. What is your expected result? — zoowalk, Dec 03 '20 at 11:22

score 0 · Answer 1 · answered Dec 03 '20 at 11:36

You can get the desired output of run made in each over for each match for each team using dplyr package. For regression, I would recommend you to post message in stack validated.

library(dplyr)
df %>% group_by(matchid, bat_team, over) %>% summarise(over_run = sum(runs))

Data

df <- data.frame(matchid = rep(1:2, 20), bat_team = rep(c("A", "B"), each = 20), 
             ball = rep(1:6, length.out = 40), over = rep(1:4, each = 10), runs = sample(1:6, 40, replace = T))

Finding the sum in R based off different variables

1 Answers1