R: Sum variable values conditional on value of other variable

Question

I have a data frame which looks like this:

  year country inhabitants
1    1       A          15
2    2       A          10
3    3       A          24
4    1       B          76
5    2       B          69
6    3       B          58
7    1       C         120
8    2       C         131
9    3       C         128

Now, I would like to create the sum of "inhabitants" for each year, over all countries. I.e., my solution would look like:

  year country inhabitants sum_inhabitants
1    1       A          15             211
2    2       A          10             210
3    3       A          21             207
4    1       B          76             211
5    2       B          69             210
6    3       B          58             207
7    1       C         120             211
8    2       C         131             210
9    3       C         128             207

My original data frame contains many more observations, which is why I can't do the computation by hand.

It would be great if you could supply a minimal reproducible example to go along with your question. Something we can work from and use to show you how it might be possible to answer your question. That way others can also befit form your question, and the accompanying answer, in the future. You can have a look at [this SO post](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on how to make a great reproducible example in R. Also, please outline what you have already already tried. — Eric Fail, Jan 14 '16 at 15:59

score 0 · Accepted Answer · answered Jan 14 '16 at 16:04

We can use ave to sum by year with no need for outside packages. The advantage that it has over aggregate is that it will not summarize but rather fill in-line:

df$sum_inhabitants <- ave(df$inhabitants, df$year, FUN=sum)
# year country inhabitants sum_inhabitants
# 1    1       A          15             211
# 2    2       A          10             210
# 3    3       A          21             207
# 4    1       B          76             211
# 5    2       B          69             210
# 6    3       B          58             207
# 7    1       C         120             211
# 8    2       C         131             210
# 9    3       C         128             207

Perfect, thanks a lot! Is there a way to ignore NAs in "inhabitants"? — Theresa, Jan 14 '16 at 16:31
Yes. `ave(df$inhabitants, df$year, FUN=function(x) sum(x, na.rm=T))` — Pierre L, Jan 14 '16 at 16:35

score 0 · Answer 2 · answered Jan 14 '16 at 16:05

Using dplyr package, you can do something like this:

library(dplyr)
df %>% group_by(year) %>% summarise(sum_inhabitants = sum(inhabitants))

If you really want to retain duplicates in that column and add it to original data frame, change summarise above to mutate, and that will give you the exact output you specified above.

If you want to get it by year and by country, you can do this:

df %>% group_by(year, country) %>% summarise(sum_inhabitants = sum(inhabitants))

R: Sum variable values conditional on value of other variable

2 Answers2