1

I'm currently facing the following issue and would highly appreciate any help. My data frame looks like this

country_birth   year    migrants   live_in                gender
Albania         2000      1        Australia           male
Germany         2000      2        Australia           female
Albania         2008      3        Australia           male
Albania         2000      6        Australia           female
Germany         2004      2        Australia           female
UK               2004      2         Germany           female
US               2004      5          UK                male

Now I would like to get the sum of migrants (both gender) for the same country of birth and the same live_in country for a matching year. A new dataframe should look something like this


country_birth   year    total_migrants   live_in                
Albania         2000      7              Australia           
...             ...      ...                ...

Many thanks in advance!

MixedModeler
  • 125
  • 7

3 Answers3

4

You can try aggregate + subset like below

> aggregate(migrants ~ ., subset(df, select = -gender), sum)
  country_birth year   live_in migrants
1       Albania 2000 Australia        7
2       Germany 2000 Australia        2
3       Germany 2004 Australia        2
4       Albania 2008 Australia        3
5            UK 2004   Germany        2
6            US 2004        UK        5

where

  • subset omits the columns gender
  • aggregate helps you aggregate migrants, grouped by all other columns.
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
3
library(tidyverse)
data %>% 
  count(country_birth, year, live_in, wt = migrants, name = "total_migrants")


# # A tibble: 6 x 4
#   country_birth  year live_in   total_migrants
#   <chr>         <dbl> <chr>              <dbl>
# 1 Albania        2000 Australia              7
# 2 Albania        2008 Australia              3
# 3 Germany        2000 Australia              2
# 4 Germany        2004 Australia              2
# 5 UK             2004 Germany                2
# 6 US             2004 UK                     5
Zaw
  • 1,434
  • 7
  • 15
2

Here is the {dplyr} approach:

data %>%
  group_by(country_birth, year, live_in) %>%
  summarise(total_migrants = sum(total_migrants))

You can learn more about grouped summaries by reading the dplyr documentation or at R for Data Science.

wurli
  • 2,314
  • 10
  • 17