1

I have kinda a dumb r question. I'm trying to aggregate the total number of events in the same week (Year_Week) by location (Field). The below screenshot of my table is similar to my csv file. I was already able to sum the Score by Field and Year_Week using the following code: aggregate(x = df$Score, by = list(df$Year_Week, df$Field), FUN = sum)

Table of Games

However, I'm not sure about how to sum the events by Year_Week and Field. For example, because 2 games occurred during the week of 2019-4 in Irvine, I'd like my output to show "2". Similarly, because three games occurred the week of 2019-2, I want my output to show "3". I'd like my output table to look something like this: output table

where sum_Game is the number of games that occurred in one location (Field) in a given Year_Week, and sum_Score is the sum of the scores of those games. Thanks in advance!

ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • [Count number of rows within each group](https://stackoverflow.com/questions/9809166/count-number-of-rows-within-each-group); [How to sum a variable by group](https://stackoverflow.com/questions/1660124/how-to-sum-a-variable-by-group) – Henrik Aug 15 '21 at 22:04

2 Answers2

2

Using data.table, convert the data to data.table (setDT), grouped by 'Field', 'Year_Week', and get the number of observations (.N) and sum of 'Score' to return a summarised dataset with columns 'sum_Game' and 'sum_Score'

library(data.table)
setDT(df)[, .(sum_Game = .N, sum_Score = sum(Score)), by = .(Field, Year_Week)]

NOTE: The sum_Game would not be a good column name as this gives the indication that we are summing up the 'Game_id' column which we are not. Instead, it is just returning the number of games.

In short, if we are careful in naming objects, column names etc., there wouldn't be much need for detailed description about the code

akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks so much, akrun! That code is perfect for summing up the scores by week and location. I really appreciate it. By chance, would you also know how to sum up the number of games by week and location, so that it's similar to the third column 'sum_Game' in my table? Thanks again for your help! – Dylan Forrester Aug 15 '21 at 20:02
  • @DylanForrester please find the updated one – akrun Aug 15 '21 at 20:04
  • 1
    Thanks, akrun! This is great. And will definitely be more careful about naming my columns in the future. Thanks again! – Dylan Forrester Aug 15 '21 at 20:19
1

Are you looking for such a solution?

With aggregate

aggregate(.~Year_Week+Field, df, sum)

With dplyr

library(dplyr)
df %>% 
    group_by(Year_Week, Field) %>% 
    summarize(sumScour = sum(Score))
  Year_Week Field   sumScour
  <chr>     <chr>      <int>
1 2019-12   Irvine         8
2 2019-2    Anaheim        4
3 2019-24   Anaheim        1
4 2019-4    Irvine        11
TarJae
  • 72,363
  • 6
  • 19
  • 66
  • 1
    Hi TarJae, thanks so much for you quick reply! Yes, something very similar to that. I'll update my original question to include the output table i'm trying to get. Thanks again! – Dylan Forrester Aug 15 '21 at 17:34