0

I would like to plot a age pyramid in R, similar to Population pyramid plot with ggplot2 and dplyr (instead of plyr).

The problem is that my data is already aggregated by subgroups. So I don't want to count the number of occurence of age 65, but the sum of all numbers that are of age 65.

eg:

df = structure(list(number = c(26778, 28388, 23491, 18602, 15787, 
24536), gender = c("F", "M", "F", "M", "F", "M"), age = c(65, 
65, 65, 65, 74, 58)), .Names = c("number", "gender", "age"), row.names = c(142L, 
234L, 243L, 252L, 298L, 356L), class = "data.frame")

How should I change this code:

library("ggplot2")
ggplot(data = df, aes(x = age, fill = gender)) + 
  geom_bar(data = subset(df, gender == "M")) + 
  geom_bar(data = subset(df, gender == "F"), 
           mapping = aes(y = - ..count.. ),
           position = "identity") +
  scale_y_continuous(labels = abs) +
  coord_flip()
Community
  • 1
  • 1
RockScience
  • 17,932
  • 26
  • 89
  • 125

1 Answers1

2

You could summarize the data beforehand and then pass it onto ggplot like below:

df1 <- df %>% group_by(gender,age) %>% summarise(s_age = sum(age))

ggplot(data = df1, aes(x = age,y=s_age, fill = gender)) + 
  geom_bar(data = filter(df1, gender == "F"), stat = "identity" ) + 
  geom_bar(data = filter(df1, gender == "M"), stat="identity", aes(y=-s_age) ) + 
  coord_flip() 

enter image description here

ab90hi
  • 435
  • 1
  • 4
  • 11