Conditional Sum of a column in R with NAs

Question

I have a dataset with 4 columns which looks like that:

City	Year	Week	Average
Guelph	2020	2020-04-12	28.3
Hamilton	2020	2020-04-12	10.7
Waterloo	2020	2020-04-12	50.1
Guelph	2020	2020-04-20	3.5
Hamilton	2020	2020-04-20	42.9

I would like to sum the average column for the same week. In other words, I want to create a new dataset with three columns (Year, week, Average) where I won't have 3 different rows for the same week but only one (e.g instead of having three times 20220-04-12, I will have it one) and the corresponding cell in the average column will be the sum of all the rows that correspond to the same week. Something like that:

Year	Week	Average
2020	2020-04-12	89.1
2020	2020-04-20	46.4

where 89.1 is the sum of the first three rows that are at the same week and 46.4 is the sum of the last two rows of the initial table that correspond to the same week (2020-04-20).

The code I am using for that looks like that:

data_set <- data_set %>%
    select(`Year`, `Week`, `Average`) %>%
    group_by(Year, Week) %>% 
    summarize(Average = sum(Average))

but for some weeks I am getting back NAs and for some other I get the correct sum I want. The data are all numeric and in the initial dataset there are some NA values on the Average column.

Thanks in advance

You may want `sum(Average, na.rm = TRUE)`. – neilfws Mar 10 '21 at 22:06 — neilfws, Mar 10 '21 at 22:06

score 1 · Accepted Answer · answered Mar 10 '21 at 22:17

You can accomplish this by passing in na.rm = TRUE to sum. Also, since you group_by(Year, Week), there isn't much to gain with using select in this case since you are generating a summary statistic on the Average variable within summarise.

df <- structure(list(City = c("Guelph", "Hamilton", "Waterloo", "Guelph", 
"Hamilton"), Year = c(2020L, 2020L, 2020L, 2020L, 2020L), Week = c("2020-04-12", 
"2020-04-12", "2020-04-12", "2020-04-20", "2020-04-20"), Average = c(28.3, 
10.7, 50.1, 3.5, 42.9)), class = "data.frame", row.names = c(NA, 
-5L))

library(dplyr)

df %>% 
  mutate(
    Week = as.Date(Week),
  ) %>% 
  group_by(Year, Week) %>% 
  summarise(
    Average = sum(Average, na.rm = TRUE)
  )

#> # A tibble: 2 x 3
#> # Groups:   Year [1]
#>    Year Week       Average
#>   <int> <date>       <dbl>
#> 1  2020 2020-04-12    89.1
#> 2  2020 2020-04-20    46.4

^{Created on 2021-03-10 by the reprex package (v0.3.0)}

You're welcome. For future questions, please read up on how to provide a properly formatted [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). For example, to produce a minimal data set, you can use `head()`, `subset()`, or the indices. Then use `dput()` to give us something that can be put in R immediately. — Eric, Mar 10 '21 at 23:12

Conditional Sum of a column in R with NAs

1 Answers1