Background
Below is my gamedata dataset in dput form -- it contains line scores for some MLB games.
structure(list(team = c("NYM", "NYM", "BOS", "NYM", "BOS"), linescore = c("010000000",
"(10)1140006x", "002200010", "00000(11)01x", "311200"), ondate = structure(c(18475,
18476, 18487, 18489, 18494), class = "Date")), class = "data.frame", row.names = c(NA,
-5L))
For example, here is one line score: "002200010".
Some line scores end with an 'x' and some have a two-digit value in parentheses, as in "00000(11)01x". Each digit not in parentheses indicates how many runs the team scored in the inning. If a team scores more than nine runs in an inning, the number is placed in parentheses, so in the line score "00000(11)01x" the team scored 11 runs in the sixth inning and did not come to bat in the bottom of the ninth (shown by the 'x').
Not every line score has nine innings. Some have more and some have as few as six.
What I need to do
First, what I need to do is to get how many runs a team scored in each inning, e.g., the first, second, third, and so on, and put each runs scored in a new column. I would prefer for the solution to use dplyr.
I have reviewed stackoverflow's suggested solutions, but found none that matched what I need. If there is one, I would appreciate it if you would share its URL.
I have tried to do it using this code:
gamedata %>%
select(ondate, team, linescore) %>%
mutate(inng1 = str_extract(linescore, "\\d|\\(\\d{2}\\)"))
Here is the output:
ondate team linescore inng1
2020-08-01 NYM 010000000 0
2020-08-02 NYM (10)1140006x (10)
2020-08-13 BOS 002200010 0
2020-08-15 NYM 00000(11)01x 0
2020-08-20 BOS 311200 3
Second, how can I remove the parentheses in the inng1 column for '10'?
The code below produced the error beneath it:
gamedata %>%
select(ondate, team, linescore) %>%
mutate(inng1 = str_extract(linescore, "\\d|\\(\\d{2}\\)"))
str_remove_all(inng1,"[()]")
This is the error message I got:
"Error in stri_replace_all_regex(string, pattern, fix_replacement(replacement), : object 'inng1' not found"
Third, I need to know how to extract the runs scored for each additional inning, starting with the second, putting each value in its own column, e.g, inng2, inng3, and so one.
At the end, I should have the output shown above (without parentheses for each double-digit inning) with a column for each inning, so there would be a column titled "inng1," "inng2," inng3", "inng4", and so on. The data in the inning columns needs to be numeric as later I will be summing them.