1

I have this dataframe here:

smallerDF <- structure(list(category = c("Opponent", "Opponent", "Opponent", 
"Opponent", "P1", "P2", "P3", "P2", "P2", "Opponent", "Opponent", 
"P1"), Event = c("Good Pass", "Good Pass", "Good Pass", "Turnover", 
"Good Pass", "Good Pass", "Good Pass", "Good Pass", "Bad Pass", 
"Intercepted Pass", "Bad Pass", "Good Pass"), Value = c(2, 2, 
2, -3, 2, 2, 2, 2, -2, 1, -2, 2), `Score Sum` = c(2, 4, 6, 3, 
2, 4, 6, 8, 6, 1, -1, 2)), row.names = c(NA, -12L), class = c("tbl_df", 
"tbl", "data.frame"))

It contains 4 columns and 12 rows. In the 3rd column are values that were assigned based on the event. In the 4th column, I am trying to add the values to get a rolling sum. So for every time the Opponent had an event, their current value would be added to their previous score sum, and similar for P1/P2/P3. I have been able to roll the sums to how I expect it to be up until row 10.

I have the following code written here:

for (i in 1:nrow(smallerDF)) {
  #print(i)
  if (smallerDF$Event[i] == "Good Pass") {
    smallerDF$Value[i] <- 2
  }
  
  if (smallerDF$Event[i] == "Bad Pass") {
    smallerDF$Value[i] <- -2
  }
  
  if (smallerDF$Event[i] == "Intercepted Pass") {
    smallerDF$Value[i] <- 1
  }
  
  if (smallerDF$Event[i] == "Turnover") {
    smallerDF$Value[i] <- -3
  }
  
  if (smallerDF$category[i] == "Opponent") {
    #print(i)
    if (i != 1 && smallerDF$category[i-1] == "Opponent") {
      smallerDF$`Score Sum`[i] <- smallerDF$Value[i] + smallerDF$`Score Sum`[i-1]
    }
  }
  else if (smallerDF$category[i] %in% dfList) {
    if (i != 1 && smallerDF$category[i-1] %in% dfList) {
      smallerDF$`Score Sum`[i] <- smallerDF$Value[i] + smallerDF$`Score Sum`[i-1]
    }
  }
}

This works up until row 10 since I am using [i-1], but I can't figure out how to get row 10 to reference back to row 4 (the last time Opponent was used) to add cell [10,3] onto cell [4,4].

The final result should look like

category Event            Value `Score Sum`
   <chr>    <chr>            <dbl>       <dbl>
 1 Opponent Good Pass            2           2
 2 Opponent Good Pass            2           4
 3 Opponent Good Pass            2           6
 4 Opponent Turnover            -3           3
 5 P1       Good Pass            2           2
 6 P2       Good Pass            2           4
 7 P3       Good Pass            2           6
 8 P2       Good Pass            2           8
 9 P2       Bad Pass            -2           6
10 Opponent Intercepted Pass     1           4
11 Opponent Bad Pass            -2           2
12 P1       Good Pass            2           8

I tried incorporating the use of this code

dt <- data.table(smallerDF)
newDT <- dt[ , .SD[.N] ,  by = c("category") ]

but this only returns the very last row for each different value in category, and not the latest/previous occurrence of the category.

Any help would be greatly appreciated. Thanks

r2evans
  • 141,215
  • 6
  • 77
  • 149
samrizz4
  • 516
  • 5
  • 11
  • For row 12 the score sum is 8, shouldn't it be 4? – Nad Pat Sep 01 '21 at 18:11
  • The 1 in cell [10,4] from dput is just my code adding the 1 in [10,3] into [10,4]. Cell [10,4] should be adding the 1 in cell [10,3] with the 3 in cell [4,4] to get 4 in the expected output. Doing this adding referencing back to [4,4] the last time Opponent was used is what is challenging me. Does that make more sense? @akrun – samrizz4 Sep 01 '21 at 18:11
  • @NadPat Ultimately yes it should be 4, but I was trying to make the question simpler by adding the P1/P2/P3 together, so the P1 value in [12,3] will get added to P2's score sum in [9,4] – samrizz4 Sep 01 '21 at 18:14
  • @akrun I'm not exactly looking to group the rows together since the larger DF is over time, and I'd like to see the Score Sum (column 4) trends as time goes on to compare the P1/2/3 and Opponent scores – samrizz4 Sep 01 '21 at 18:17

2 Answers2

2

I think the base premise here is a grouped-calculation (not easy in a for loop), and it should be grouped on whether category is "Opponnent" or not (lumping "P1", "P2", etc, together).

Data prep: starting with just the first two columns of your dataset above:

smallerDF <- structure(list(category = c("Opponent", "Opponent", "Opponent", "Opponent", "P1", "P2", "P3", "P2", "P2", "Opponent", "Opponent", "P1"), Event = c("Good Pass", "Good Pass", "Good Pass", "Turnover", "Good Pass", "Good Pass", "Good Pass", "Good Pass", "Bad Pass", "Intercepted Pass", "Bad Pass", "Good Pass")), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"))

I'll add a "time" column: some utilities (e.g., base::merge) do not honor row-order, despite some best efforts. I think it's generally safer anyway to have a "time" component in there to remove accidental reordering. Neither the data.table nor dplyr solutions below reorder it inadvertently, but it's still not a horrible idea.

smallerDF$time <- seq_len(nrow(smallerDF))

base R

This is perhaps the least-intuitive of the three, since the grouping functions in R can seem daunting. These include ave, aggregate, by, tapply, etc. I'll stick with ave for now, as it is the simplest and perhaps easiest to read.

First, we'll create a "merge/join" table for Value (other ways exist to bring in these values, see https://stackoverflow.com/a/68999591/3358272; @ViníciusFélix's answer is a great example using case_when for this purpose). Second, we'll aggregate by "Opponent vs NotOpponent".

values <- data.frame(
  Event = c("Good Pass", "Bad Pass", "Intercepted Pass", "Turnover"),
  Value = c(2, -2, 1, -3)
)
smallerDF2 <- merge(smallerDF, values, by = "Event", all.x = TRUE, sort = FALSE)
## feel free to verify that `smallerDF2` is no longer in the original order,
## despite `sort=FALSE`. Order is not guaranteed with `base::merge`, period.
smallerDF2 <- smallerDF2[order(smallerDF2$time),]
smallerDF2
#               Event category time Value
# 1         Good Pass Opponent    1     2
# 2         Good Pass Opponent    2     2
# 3         Good Pass Opponent    3     2
# 9          Turnover Opponent    4    -3
# 5         Good Pass       P1    5     2
# 6         Good Pass       P2    6     2
# 7         Good Pass       P3    7     2
# 4         Good Pass       P2    8     2
# 10         Bad Pass       P2    9    -2
# 12 Intercepted Pass Opponent   10     1
# 11         Bad Pass Opponent   11    -2
# 8         Good Pass       P1   12     2
smallerDF2$`Score Sum2` <- ave(smallerDF2$Value, smallerDF2$category == "Opponent", FUN = cumsum)
smallerDF2
#               Event category time Value Score Sum2
# 1         Good Pass Opponent    1     2          2
# 2         Good Pass Opponent    2     2          4
# 3         Good Pass Opponent    3     2          6
# 9          Turnover Opponent    4    -3          3
# 5         Good Pass       P1    5     2          2
# 6         Good Pass       P2    6     2          4
# 7         Good Pass       P3    7     2          6
# 4         Good Pass       P2    8     2          8
# 10         Bad Pass       P2    9    -2          6
# 12 Intercepted Pass Opponent   10     1          4
# 11         Bad Pass Opponent   11    -2          2
# 8         Good Pass       P1   12     2          8

data.table

library(data.table)
smallerDT <- as.data.table(smallerDF)
smallerDT[values, Value := Value, on = .(Event)]
smallerDT[, `Score Sum2` := cumsum(Value), by = .(category == "Opponent")]

dplyr

library(dplyr)
left_join(smallerDF, values, by = "Event") %>%
  group_by(g = (category == "Opponent")) %>%
  mutate(`Score Sum` = cumsum(Value)) %>%
  ungroup() %>%
  select(-g)
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Thanks - your base R version seemed to make the most sense and the dplyr kept giving me an error. However, using the base R version, is it possible to aggregate a different way? In addition to Opponent vs Non-Opponent, I was also planning on summing based on Opponent vs P1 vs P2 vs P3. Is this a possible edit with the way that was written? – samrizz4 Sep 02 '21 at 01:54
  • 1
    *"In addition to"* suggests multiple runs since you cannot aggregate in two ways in one step. My guess, though, is change `ave(..., smallerDF2$category == "Opponent", ...)` to `ave(..., smallerDF2$category, ...)` and see if it's what you want. If not, please edit your question with your updated expectation. – r2evans Sep 02 '21 at 12:35
  • 1
    Thanks! Using ```ave(..., smallerDF2$category, ...)``` worked and showed what I wanted. – samrizz4 Sep 02 '21 at 15:19
0

Here is a tidyverse solution

smallerDF %>% 
  #Removing original values from your data
  select(-Value,-`Score Sum`) %>% 
  #Creating Value variable with case_when
  mutate(
    Value = case_when(
      Event == "Good Pass" ~ 2,
      Event == "Bad Pass" ~ -2,
      Event == "Intercepted Pass" ~ 1,
      Event == "Turnover" ~ -3
    ),
    #Creating auxiliar logical variable (opponent or not oppponent)
    Opponent = if_else(category == "Opponent",TRUE,FALSE)
  ) %>% 
  #Creating cumulative sum by either Opponent or not oppponent
  group_by(Opponent) %>% 
  mutate(`Score sum` = cumsum(Value))

-output

 A tibble: 12 x 4
   category Event            Value `Score Sum`
   <chr>    <chr>            <dbl>       <dbl>
 1 Opponent Good Pass            2           2
 2 Opponent Good Pass            2           4
 3 Opponent Good Pass            2           6
 4 Opponent Turnover            -3           3
 5 P1       Good Pass            2           2
 6 P2       Good Pass            2           4
 7 P3       Good Pass            2           6
 8 P2       Good Pass            2           8
 9 P2       Bad Pass            -2           6
10 Opponent Intercepted Pass     1           1
11 Opponent Bad Pass            -2          -1
12 P1       Good Pass            2           2
Vinícius Félix
  • 8,448
  • 6
  • 16
  • 32