How do I change the way my data is presented in a data frame in R?

Question

My data is currently presented as a frequency table of age and gender sorted by each region (as follows as an example of a couple of rows)

Region	age	gender	population
1	0	M	114
1	0	F	87
1	1	M	36
1	1	F	54

It is a large data set and has 500 regions and data for males and females aged 0-55. I am wanting to present the data with each region as one row, with total male and female populations being split into separate columns. It would also help to be able to have columns showing total population <30 and total population >=30 for each region. e.g.:

Region	M	F	Total	<30	>=30
1	150	215	365	200	165
2	85	75	160	95	65

Thanks in advance.

You have been migrated to Stack Overflow, welcome! Do you mind to work through our brief tutotial on [how to make reproducible examples](https://stackoverflow.com/a/5963610/6574038)? I'm sure we can help you! — jay.sf, May 20 '21 at 15:47

score 1 · Answer 1 · answered May 20 '21 at 15:48

Using the tidyverse you could do something like this (not tested on your actual data)...

library(dplyr)

df %>% group_by(Region) %>%
   summarise(M = sum(population[gender == "M"]),
             F = sum(population[gender == "F"]),
             Total = sum(population),
             under30 = sum(population[age<30]),
             over30 = sum(population[age>=30]))

If you've got any missing data, you might want to add na.rm = TRUE as an option in the sum statements.

How do I change the way my data is presented in a data frame in R?

1 Answers1