0

I've written the following code. I would like a pyramid split by gender. Instead, I'm getting half a pyramid with the other half not visible. I have four columns in my excel file: Grade, Number, Age, Gender

library(xlsx)
library(ggplot2)
library(plyr)

data1 <- read.xlsx("C:/Users/cameron.kashani/Documents/KPIs/R/Dummy KPI data.xlsx"
,sheetIndex=5,rowIndex=1:11,colIndex=1:4)

data1df<-data.frame(data1)

pyramid1 <- ggplot(data1df, aes(x = Grade, y = Number,fill=Age)) + 
  geom_bar(data=subset(data1df, data1df$Gender == "Female"), stat = "identity") + 
  geom_bar(data=subset(data1df, data1df$Gender == "Male"), stat = "identity") +
  scale_y_continuous(breaks = seq(-50, 50, 5),
                     labels=abs(seq(-50, 50, 5)))+
  coord_flip()+
  theme_bw()

pyramid1
jay.sf
  • 60,139
  • 8
  • 53
  • 110
Cam K
  • 127
  • 2
  • 2
  • 13
  • 1
    Welcome to Stack Overflow. Please [provide example data](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) with `dput(data1df)` in order to make your issue reproducible. – jay.sf Jul 09 '18 at 13:49
  • A few other comments: (a) in `subset` you don't need to repeat the name of the data frame, `subset(data1df, Gender == "Female")` works just fine, (b) I think if you set `group = Gender)` inside `aes()`, you would only need a single `geom_bar` layer (but can't test without reproducible example), (c) You don't need `plyr` for any of the code you've shown. – Gregor Thomas Jul 09 '18 at 13:55
  • Hi jaySf, struggling to figure out how to add code in comments. Gregor is right, the data frame name doesn't need to be repeated! I have two geom_bar layers because I have added two colour scales. Can't figure out how to add two legends but that's a completely different question probably. I'll remove library(plyr). Thank you!! – Cam K Jul 09 '18 at 16:04
  • It's okay if you can't figure out how to put code in comments, since it should go in the post anyway. You can edit your post to put your data there – camille Jul 09 '18 at 16:13

2 Answers2

0

I solved this by multiplying the other geom_bar data by -1. Makes sense in hindsight.

pyramid1 <- ggplot(data1df, aes(x = Grade, y = Number, fill=Age)) + 
  geom_bar(data=subset(data1df, Gender == "Male"), stat = "identity") + 
  geom_bar(data=subset(data1df, Gender == "Female"),aes(y=Number*(-1)), stat = "identity") +
  scale_y_continuous(breaks = seq(-50, 50, 5),
                     labels=abs(seq(-50, 50, 5)))+
  coord_flip()+
  theme_bw()
Cam K
  • 127
  • 2
  • 2
  • 13
0

It's generally better practice to map a variable to an aesthetic to minimize repeated geoms—in this case, both your geom_bars are doing essentially the same thing. If you instead create or change a variable to set the negative values for one group ahead of time, there's no need to split the data into two sets of bars. Often for age pyramids, this would mean having bars for women on one side and bars for men on the other, positioned to have the same position along the vertical axis with fill color mapped to gender.

You didn't include data, so I made some up to model what's in your aes.

library(tidyverse)

set.seed(124)
df <- data_frame(age = rep(1:15, times = 2), 
             grade = rep(letters[1:15], times = 2),
             number = round(rnorm(30, mean = 100, sd = 20)), 
             gender = rep(c("a", "b"), each = 15))

df <- df %>%
  mutate(y = ifelse(gender == "a", number * -1, number))

ggplot(df, aes(x = grade, y = y, fill = age)) +
  geom_col(position = "stack") +
  scale_y_continuous(labels = abs) +
  coord_flip() +
  scale_fill_viridis_c()

Created on 2018-07-09 by the reprex package (v0.2.0).

camille
  • 16,432
  • 18
  • 38
  • 60
  • Thank you! My data has four columns: Grade (discrete), Number (continuous), Age (continuous), Gender (discrete). Each grade must have both men and women. For each grade's gender, there would be an age and a number (of people). Your answer was very helpful and I have produced the graph now! – Cam K Jul 09 '18 at 16:11