Sum up observations from data frame in R (multiple conditions)

Question

I'm currently facing the following issue and would highly appreciate any help. My data frame looks like this

country_birth   year    migrants   live_in                gender
Albania         2000      1        Australia           male
Germany         2000      2        Australia           female
Albania         2008      3        Australia           male
Albania         2000      6        Australia           female
Germany         2004      2        Australia           female
UK               2004      2         Germany           female
US               2004      5          UK                male

Now I would like to get the sum of migrants (both gender) for the same country of birth and the same live_in country for a matching year. A new dataframe should look something like this


country_birth   year    total_migrants   live_in                
Albania         2000      7              Australia           
...             ...      ...                ...

Many thanks in advance!

score 4 · Accepted Answer · answered Jun 01 '21 at 07:48

You can try aggregate + subset like below

> aggregate(migrants ~ ., subset(df, select = -gender), sum)
  country_birth year   live_in migrants
1       Albania 2000 Australia        7
2       Germany 2000 Australia        2
3       Germany 2004 Australia        2
4       Albania 2008 Australia        3
5            UK 2004   Germany        2
6            US 2004        UK        5

where

subset omits the columns gender
aggregate helps you aggregate migrants, grouped by all other columns.

score 3 · Answer 2 · answered Jun 01 '21 at 07:52

library(tidyverse)
data %>% 
  count(country_birth, year, live_in, wt = migrants, name = "total_migrants")


# # A tibble: 6 x 4
#   country_birth  year live_in   total_migrants
#   <chr>         <dbl> <chr>              <dbl>
# 1 Albania        2000 Australia              7
# 2 Albania        2008 Australia              3
# 3 Germany        2000 Australia              2
# 4 Germany        2004 Australia              2
# 5 UK             2004 Germany                2
# 6 US             2004 UK                     5

score 2 · Answer 3 · answered Jun 01 '21 at 07:54

2

Here is the {dplyr} approach:

data %>%
  group_by(country_birth, year, live_in) %>%
  summarise(total_migrants = sum(total_migrants))

You can learn more about grouped summaries by reading the dplyr documentation or at R for Data Science.

answered Jun 01 '21 at 07:54

wurli

2,314
10
17

Sum up observations from data frame in R (multiple conditions)

3 Answers3