I have a dataset with 4 columns which looks like that:
City | Year | Week | Average |
---|---|---|---|
Guelph | 2020 | 2020-04-12 | 28.3 |
Hamilton | 2020 | 2020-04-12 | 10.7 |
Waterloo | 2020 | 2020-04-12 | 50.1 |
Guelph | 2020 | 2020-04-20 | 3.5 |
Hamilton | 2020 | 2020-04-20 | 42.9 |
I would like to sum the average column for the same week. In other words, I want to create a new dataset with three columns (Year, week, Average) where I won't have 3 different rows for the same week but only one (e.g instead of having three times 20220-04-12, I will have it one) and the corresponding cell in the average column will be the sum of all the rows that correspond to the same week. Something like that:
Year | Week | Average |
---|---|---|
2020 | 2020-04-12 | 89.1 |
2020 | 2020-04-20 | 46.4 |
where 89.1 is the sum of the first three rows that are at the same week and 46.4 is the sum of the last two rows of the initial table that correspond to the same week (2020-04-20).
The code I am using for that looks like that:
data_set <- data_set %>%
select(`Year`, `Week`, `Average`) %>%
group_by(Year, Week) %>%
summarize(Average = sum(Average))
but for some weeks I am getting back NAs and for some other I get the correct sum I want. The data are all numeric and in the initial dataset there are some NA values on the Average column.
Thanks in advance