0

My data is currently presented as a frequency table of age and gender sorted by each region (as follows as an example of a couple of rows)

Region age gender population
1 0 M 114
1 0 F 87
1 1 M 36
1 1 F 54

It is a large data set and has 500 regions and data for males and females aged 0-55. I am wanting to present the data with each region as one row, with total male and female populations being split into separate columns. It would also help to be able to have columns showing total population <30 and total population >=30 for each region. e.g.:

Region M F Total <30 >=30
1 150 215 365 200 165
2 85 75 160 95 65

Thanks in advance.

  • 1
    You have been migrated to Stack Overflow, welcome! Do you mind to work through our brief tutotial on [how to make reproducible examples](https://stackoverflow.com/a/5963610/6574038)? I'm sure we can help you! – jay.sf May 20 '21 at 15:47

1 Answers1

1

Using the tidyverse you could do something like this (not tested on your actual data)...

library(dplyr)

df %>% group_by(Region) %>%
   summarise(M = sum(population[gender == "M"]),
             F = sum(population[gender == "F"]),
             Total = sum(population),
             under30 = sum(population[age<30]),
             over30 = sum(population[age>=30]))

If you've got any missing data, you might want to add na.rm = TRUE as an option in the sum statements.

Andrew Gustar
  • 17,295
  • 1
  • 22
  • 32