I have this data set of batting data from the GameDay servers:
eliasID teamID gameID gameDate h hr bb so rbi ab runs t d lob sb cs sf hbp
1 430203 kca 2010/04/01/arimlb-kcamlb-1 4/1/2010 1 2 0 0 0 0 0 0 0 0
2 459714 kca 2010/04/01/arimlb-kcamlb-1 4/1/2010 1 0 0 1 0 3 1 0 0 1 0 0 0 0
3 325392 kca 2010/04/01/arimlb-kcamlb-1 4/1/2010 0 0 1 0 0 1 0 0 0 0 0 0 0 0
4 429801 kca 2010/04/01/arimlb-kcamlb-1 4/1/2010 0 0 0 1 0 3 0 0 0 2 0 0 0 0
5 456714 kca 2010/04/01/arimlb-kcamlb-1 4/1/2010 0 0 1 0 0 4 0 0 0 2 0 0 0 0
6 150449 kca 2010/04/01/arimlb-kcamlb-1 4/1/2010 0 0 0 1 1 4 0 0 0 2 0 0 0 0
ba ID gameDateFormat year Year
1 0.345 1 2010-04-01 2010 NA
2 0.250 2 2010-04-01 2010 NA
3 0.319 3 2010-04-01 2010 NA
4 0.327 4 2010-04-01 2010 NA
5 0.333 5 2010-04-01 2010 NA
6 0.217 6 2010-04-01 2010 NA
My issue is that I'd like to build a running total of at bats (ab) for each game, but total only those at bats from games with gameDate lower than the row's gameDate, and with games in the same gameYear.
I've look at the for loop and dplyr has been suggested, but these all want to sum all the ab column for one player, when I need an ongoing sum added to each game to show the player's ab total for the year so far at that game.
I'm attempting to build the equivalent of the kind of statistics you see on baseball-ref.com.
In English, I'm looking for:
For each EliasID, gameID in Batting: sum(ab) for the EliasID where the gameDate < this row's gameDate and the gameYear = this row's gameYear
What do you think?