I have dataset that looks like this:
match_id batting_team ball over total_runs
1 Team_X 1 1 2
1 Team_X 2 1 0
1 Team_X 3 1 1
1 Team_X 4 1 0
1 Team_X 5 1 2
1 Team_X 6 1 2
1 Team_X 1 2 2
1 Team_X 2 2 0
1 Team_X 3 2 1
1 Team_X 4 2 0
1 Team_X 5 2 2
1 Team_X 6 2 2
The data then goes on to show the runs made in each ball by each team in each over. I would like to add a column that does on to show the number of runs scored in each over for every over by every team by every match. The aim is to go on to plot a linear regression model to show the number of runs in an over compared to which over it is in a match. Does anyone have any advice?