4

I am trying to create a population pyramid, faceted across a number of regions. The problem is these regions have very different populations sizes, creating problems for the scale on the y axis (coord flipped).

I am trying to follow the method described here: https://rpubs.com/walkerke/pyramids_ggplot2 - which makes the pyramid by using negative numbers for one sex and then uses scale_y_continuous() to get rid of the negative numbers.

First i create a simple sample dataset, for 2 age groups, across two countries:

country <- c(1, 1, 1, 1, 2, 2, 2, 2)
age.range <- c("0-4", "0-4", "5-9", "5-9", "0-4", "0-4", "5-9", "5-9")
sex <- rep(c("M", "F"), times = 4)
pop <- c(-8, 9, -9, 8, -88, 99, -99, 88)
pop.pyr <- data.frame(country, age.range, sex, pop)

pop.pyr

  country age.range sex pop
1       1       0-4   M  -8
2       1       0-4   F   9
3       1       5-9   M  -9
4       1       5-9   F   8
5       2       0-4   M -88
6       2       0-4   F  99
7       2       5-9   M -99
8       2       5-9   F  88

I can build the population pyramid and facet by country:

library(ggplot2)

ggplot(pop.pyr, aes(x = age.range, y = pop, fill = sex)) + 
  geom_col(data = subset(pop.pyr, sex == "M")) +
  geom_col(data = subset(pop.pyr, sex == "F")) +
  coord_flip() +
  facet_wrap(~ country, scales = "free_x")

enter image description here

To fix the negative numbers on the y-scale (coord_flip()), i need to use: scale_y_continuous(); but doing so means i have to pick a scale_y_continuous() for both facets, which doesn't work.

ggplot(pop.pyr, aes(x = age.range, y = pop, fill = sex)) + 
  geom_col(data = subset(pop.pyr, sex == "M")) +
  geom_col(data = subset(pop.pyr, sex == "F")) +
  coord_flip() +
  facet_wrap(~ country, scales = "free_x") +
  scale_y_continuous(breaks = seq(-100, 100, 20), labels = abs(seq(-100, 100, 20)))

enter image description here

The only way around this is to use a small variable for by in seq e.g. scale_y_continuous(breaks = seq(-100, 100, 2), labels = abs(seq(-100, 100, 2))). Doing so however makes the larger scale a mess.

Is there a way to set scale_y_continuous() in such a way that I can have a different scale in different facets, while keeping: scales = free_x. Otherwise is there another way to get rid of the negative numbers in the pyramid using something other than scale_y_continuous().

If not, is the only way to do this to develop each image separately and then ggarrange() or cowplot() - essentially manually faceting?

EDIT:

I tried using facetscales() as per the comments, but i couldn't get it to work the way i wanted, nor could i fully understand the man file.

Using the instructions from here: https://github.com/zeehio/facetscales; I installed and loaded the package

library(facetscales)

Then i create the list of scales:

scales.pyr <- list(`1` = scale_y_continuous(breaks = seq(-10, 10, 2), labels = abs(seq(-10, 10, 2))), `2` = scale_y_continuous(breaks = seq(-100, 100, 20), labels = abs(seq(-100, 100, 20))))

Update ggplot:

ggplot(pop.pyr, aes(x = age.range, y = pop, fill = sex)) + 
  geom_col(data = subset(pop.pyr, sex == "M")) +
  geom_col(data = subset(pop.pyr, sex == "F")) +
  coord_flip() +
  facet_grid_sc(rows= vars(country), scales = list(y = scales.pyr))

enter image description here

This is clearly not right. The man file (https://github.com/zeehio/facetscales/blob/master/man/facet_grid_sc.Rd) says that i can use cols:

facet_grid_sc(rows = NULL, cols = NULL, scales = "fixed", space = "fixed", shrink = TRUE, labeller = "label_value", as.table = TRUE, switch = NULL, drop = TRUE, margins = FALSE, facets = NULL)
...
\item{cols}{A set of variables or expressions quoted by \code{\link[=vars]{vars()}} and defining faceting groups on the rows or columns dimension. The variables can be named (the names are passed to \code{labeller}).

If i try cols:

ggplot(pop.pyr, aes(x = age.range, y = pop, fill = sex)) + 
geom_col(data = subset(pop.pyr, sex == "M")) +
geom_col(data = subset(pop.pyr, sex == "F")) +
coord_flip() +
facet_grid_sc(cols= vars(country), scales = list(y = scales.pyr))

I get:

Error in .subset2(x, i, exact = exact) : 
  attempt to select less than one element in get1index

As is also clear, the scales are fixed, the man page likewise says i can use scales = "free" or depreciated "free_x".

\item{scales}{A list of two elements (x and y). Each element can be either "fixed" (scale limits shared across facets), "free" (with varying limits per facet), or a named list, with a different scale for each facet value. Previous scale values ("fixed", "free_x", "free_y", "free" are accepted but soft-deprecated).}

But the code example requires the scales parameter to be filled with the list of scales.

Finally, i would really like to have six regions in two rows of three. The man page indicates that i can use rows and cols to facet different variables, but i can't see any references to nrow(), ncol(), for a single variable. Using them in a larger example gives: unused argument (ncol = 3).

MorrisseyJ
  • 1,191
  • 12
  • 19
  • Take a look at the `facetscales` package: https://github.com/zeehio/facetscales – markus Jan 15 '19 at 22:28
  • 3
    Possible duplicate of [How do you set different scale limits for different facets?](https://stackoverflow.com/questions/4276218/how-do-you-set-different-scale-limits-for-different-facets) – markus Jan 15 '19 at 22:29
  • Does `scale_y_continuous(labels = abs)` work for you? You don't have to specify the breaks explicitly that way. – Z.Lin Jan 16 '19 at 05:30
  • Also, you don't need two `geom_col` layers with subsetted data sources. The link you referenced was referencing [this solution](https://stackoverflow.com/questions/14680075/simpler-population-pyramid-in-ggplot2), where the conversion to negative numbers was done in the second geom layer. Since yours is already negative, splitting into two has no additional benefit. – Z.Lin Jan 16 '19 at 05:35
  • Thanks @Z.Lin that works: `scale_y_continuous(labels = abs)` sorts things out. Also, you are right no need for the two geom_col() calls. Solution provided below. – MorrisseyJ Jan 16 '19 at 23:16

2 Answers2

2

Answer is simple: use scale_y_continuous(labels = abs). Final code looks as follows:

country <- c(1, 1, 1, 1, 2, 2, 2, 2)
age.range <- c("0-4", "0-4", "5-9", "5-9", "0-4", "0-4", "5-9", "5-9")
sex <- rep(c("M", "F"), times = 4)
pop <- c(-8, 9, -9, 8, -88, 99, -99, 88)
pop.pyr <- data.frame(country, age.range, sex, pop)

library(ggplot2)

ggplot(pop.pyr, aes(x = age.range, y = pop, fill = sex)) + 
geom_col() +
coord_flip() +
scale_y_continuous(labels = abs)
facet_wrap(~ country, scales = "free_x")

enter image description here

That was a long way around. Thanks @Z.Lin.

MorrisseyJ
  • 1,191
  • 12
  • 19
1

I think this is now possible using facetted_pos_scales() in the ggh4x library (https://teunbrand.github.io/ggh4x/reference/facetted_pos_scales.html)

enovap
  • 47
  • 5