0

I'm currently trying to reproduce a barchart that I saw in the IPCC Special Report on the Ocean and Cryosphere (see figure below || please note that there are no y-axis and x-axis labels because the figure is taken from a larger figure in the respective report, see figure 2.1 here: https://www.ipcc.ch/srocc/chapter/chapter-2/ ). The horizontal bold line represents the median value for the data while the upper and lower limit represent the 75th and 25th percentile, respectively. Thus, you could say that the grey box is a type of boxplot. My goal is to add this 'boxplot' for my data into my plot.

I was able to reproduce the back-to-back barchart (see figure below).

However, I am struggling to integrate the boxplot into this barchart. I tried to do this by using geom_boxplot() (code examples further below) but it produced an error, saying:

Error in geom_boxplot(): ! Problem while computing aesthetics. ℹ Error occurred in the 1st layer. Caused by error in FUN(): ! object 'Umgebung' not found

The other idea I had was to calculate the percentiles (incl. 25th, median and 75th) with the quantile() function and then add the desired percentiles by using geom_vline(), yet I wouldn't be able to create the grey box in the original plot with this. So I'd prefer to actually use a boxplot in the figure or create the grey box in another way. Unfortunately I'm running out of ideas and cannot make it work, which is why I'm wondering if anyone has an idea how to do this in ggplot?

Here is the data that I used (plotting_df2 is the data for the back-to-back barchart and df_ZDI is the data for the boxplot that I'd like to add) and the code for my current back-to-back barchart. Please note that the geom_boxplot() call that throws the error mentioned above is a comment in this code example. Also, the column 'Area_perc' filled with zeroes in df_ZDI was created to center the boxplot once it is added to the back-to-back barchart.

I'm glad for any help :)

library("dplyr")
library("ggplot2")

#data
plotting_df2 <- structure(list(Height_range = c("2600 - <2800", "2800 - <3000", 
    "3000 - <3200", "3200 - <3400", "3400 - <3600", "3600 - <3800", 
    "3800 - <4000", "4000 - <4200", "4200 - <4400", "4400 - <4600", 
    "4600 - <4800", "4800 - <5000", "5000 - <5200", "5200 - <5400", 
    "5400 - <5600", "5600 - <5800", "5800 - <6000", "6000 - <6200", 
    "6200 - <6400", "6400 - <6600", "6600 - <6800", "6800 - <7000", 
    "2600 - <2800", "2800 - <3000", "3000 - <3200", "3200 - <3400", 
    "3400 - <3600", "3600 - <3800", "3800 - <4000", "4000 - <4200", 
    "4200 - <4400", "4400 - <4600", "4600 - <4800", "4800 - <5000", 
    "5000 - <5200", "5200 - <5400", "5400 - <5600", "5600 - <5800", 
    "5800 - <6000", "6000 - <6200", "6200 - <6400", "6400 - <6600", 
    "6600 - <6800", "6800 - <7000"), Area_perc = c(0, -0.00813093730353531, 
    -0.260883801237534, -1.34685081373276, -4.12725403762506, -5.8860804561115, 
    -6.69221176709734, -7.43112895264885, -8.77252668268104, -12.0759025431697, 
    -14.691106543883, -13.4787204972044, -9.6733444626552, -7.30287193712887, 
    -3.61505367574966, -2.20305798709396, -1.17985012540267, -0.739452756272111, 
    -0.260689048248227, -0.179416191398369, -0.0569287331865788, 
    -0.0185380501695873, 0, 0.0210832065597772, 0.371033084959052, 
    1.31854295448614, 4.72853608089054, 10.711542546146, 22.9105680160703, 
    29.9335291171623, 20.4001968810962, 7.655966743397, 1.64776231937217, 
    0.279744368079868, 0.0214946817807394, 0, 0, 0, 0, 0, 0, 0, 0, 
    0), Umgebung = c("Glacial", "Glacial", "Glacial", "Glacial", 
    "Glacial", "Glacial", "Glacial", "Glacial", "Glacial", "Glacial", 
    "Glacial", "Glacial", "Glacial", "Glacial", "Glacial", "Glacial", 
    "Glacial", "Glacial", "Glacial", "Glacial", "Glacial", "Glacial", 
    "Periglacial", "Periglacial", "Periglacial", "Periglacial", "Periglacial", 
    "Periglacial", "Periglacial", "Periglacial", "Periglacial", "Periglacial", 
    "Periglacial", "Periglacial", "Periglacial", "Periglacial", "Periglacial", 
    "Periglacial", "Periglacial", "Periglacial", "Periglacial", "Periglacial", 
    "Periglacial", "Periglacial")), class = c("grouped_df", "tbl_df", 
    "tbl", "data.frame"), row.names = c(NA, -44L), groups = structure(list(
        Height_range = c("2600 - <2800", "2600 - <2800", "2800 - <3000", 
        "2800 - <3000", "3000 - <3200", "3000 - <3200", "3200 - <3400", 
        "3200 - <3400", "3400 - <3600", "3400 - <3600", "3600 - <3800", 
        "3600 - <3800", "3800 - <4000", "3800 - <4000", "4000 - <4200", 
        "4000 - <4200", "4200 - <4400", "4200 - <4400", "4400 - <4600", 
        "4400 - <4600", "4600 - <4800", "4600 - <4800", "4800 - <5000", 
        "4800 - <5000", "5000 - <5200", "5000 - <5200", "5200 - <5400", 
        "5200 - <5400", "5400 - <5600", "5400 - <5600", "5600 - <5800", 
        "5600 - <5800", "5800 - <6000", "5800 - <6000", "6000 - <6200", 
        "6000 - <6200", "6200 - <6400", "6200 - <6400", "6400 - <6600", 
        "6400 - <6600", "6600 - <6800", "6600 - <6800", "6800 - <7000", 
        "6800 - <7000"), Umgebung = c("Glacial", "Periglacial", "Glacial", 
        "Periglacial", "Glacial", "Periglacial", "Glacial", "Periglacial", 
        "Glacial", "Periglacial", "Glacial", "Periglacial", "Glacial", 
        "Periglacial", "Glacial", "Periglacial", "Glacial", "Periglacial", 
        "Glacial", "Periglacial", "Glacial", "Periglacial", "Glacial", 
        "Periglacial", "Glacial", "Periglacial", "Glacial", "Periglacial", 
        "Glacial", "Periglacial", "Glacial", "Periglacial", "Glacial", 
        "Periglacial", "Glacial", "Periglacial", "Glacial", "Periglacial", 
        "Glacial", "Periglacial", "Glacial", "Periglacial", "Glacial", 
        "Periglacial"), .rows = structure(list(1L, 23L, 2L, 24L, 
            3L, 25L, 4L, 26L, 5L, 27L, 6L, 28L, 7L, 29L, 8L, 30L, 
            9L, 31L, 10L, 32L, 11L, 33L, 12L, 34L, 13L, 35L, 14L, 
            36L, 15L, 37L, 16L, 38L, 17L, 39L, 18L, 40L, 19L, 41L, 
            20L, 42L, 21L, 43L, 22L, 44L), ptype = integer(0), class = c("vctrs_list_of", 
        "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
    ), row.names = c(NA, -44L), .drop = TRUE))

df_ZDI <- structure(list(elev_ZDI = c(3307.20835232205, 1331.87925720215, 
222.621368887689, 836.756324132284, 2066.40147484673, 3066.89243265788, 
1407.60896335178, 491.074270799425, 1518.27969767253, 1872.66952904595, 
2764.18468865289, 834.062467659844, 429.105528672536, 1711.29469570584, 
1818.18913269043, 2923.67825486925, 1310.91319944594, 405.303607556555, 
1356.16757982042, 1656.92714335124, 3176.3096818712, 1785.96018914117, 
323.876850085788, 742.874162546794, 1899.74567159017, 3011.39425354004, 
1974.31221652561, 317.509443208906, 439.012686824799, 2080.99008568658, 
3245.08500908746, 2500.59694722493, 642.457545979818, 674.900856865777, 
2560.82551439073, 3122.98277079264, 2156.09715440538, 621.882653978136, 
556.954903009203, 2213.49671427409, 3209.67580227322, 1722.6931822035, 
482.353815809886, 909.885089916653, 2000.93998396132, 2909.65182291667, 
1283.94950095283, 563.962105886141, 1682.45457678901, 2186.01452416314, 
2192.81169128418, 991.659460300869, 897.015666135152, 1805.44834476047, 
2088.38637356228), Area_perc = structure(c(1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L), levels = "0", class = "factor")), class = "data.frame", row.names = c(NA, 
-55L))

#code for back-to-back barchart

plot <- plotting_df2 %>% 
  ggplot(aes(x = Height_range, y = Area_perc, group = Umgebung, fill = Umgebung)) +
  geom_bar(stat = "identity", width = 0.75) +
  #geom_boxplot(data = df_ZDI, aes(x = elev_ZDI, y = Area_perc)) +
  coord_flip() +
  scale_x_discrete() +
  # another trick!
  scale_y_continuous(breaks = seq(-35, 35, 10), 
                     labels = abs(seq(-35, 35, 10)),
                     limits = c(-35,  35), expand = c(0,0)) +
  labs(x = "Elevation [m. a.s.l.]", y = expression("Area [%]"), title = "") +
  theme(legend.position = "bottom",
        plot.title = element_text(hjust = 0.5),
        legend.title = element_blank(),
        axis.title.y = element_text(margin = margin(t = 0, r = 5, b = 0, l = 0)), #move y-axis title further away from axis labels
        panel.grid.minor = element_blank()) +
  guides(fill = guide_legend(reverse = TRUE)) +   # reverse order of items in legend
  scale_fill_manual(values=c("#AD355D", "#440A68"),  # change colors of bars
                    name="",
                    breaks=c("Periglacial", "Glacial"),
                    labels=c("Periglacial", "Glacial"))
plot
Phil
  • 85
  • 7
  • Related, maybe helpful https://stackoverflow.com/questions/49003863/how-to-plot-a-hybrid-boxplot-half-boxplot-with-jitter-points-on-the-other-half , also take a look at https://erocoar.github.io/gghalves/ – Andre Wildberg Mar 14 '23 at 15:53

1 Answers1

2

The first issue with your code and which gives rise to the error is that geom_boxplot inherits the global group and fill aesthetics, which could be fixed by adding inherit.aes=FALSE.

However, even after fixing that you have a discrete x scale whereas in geom_boxplot you want to map a continuous variable on x. To fix that one option would be to convert your ranges to numerics and draw your bars via a geom_rect. To this end I use tidyr::separate_wider_regex (which requires tidyr >= 1.3.0) to split the ranges into a lower and upper column.

Note: A second option would of course be to draw the grey box as a rect as well and a segment for the median line.

library(ggplot2)
library(tidyr)
library(dplyr)

plotting_df2 <- plotting_df2 |>
  tidyr::separate_wider_regex(Height_range,
    patterns = c(lower = "\\d+", ".*?", upper = "\\d+"),
    cols_remove = FALSE
  ) |>
  mutate(across(c(lower, upper), as.numeric))

plotting_df2 %>%
  ggplot(aes(ygroup = Umgebung, fill = Umgebung)) +
  geom_rect(aes(ymin = lower + 10, ymax = upper - 10, xmin = 0, xmax = Area_perc)) +
  geom_boxplot(data = df_ZDI, aes(y = elev_ZDI), inherit.aes = FALSE, width = 20) +
  scale_x_continuous(
    breaks = seq(-35, 35, 10),
    labels = abs(seq(-35, 35, 10)),
    limits = c(-35, 35), expand = c(0, 0)
  ) +
  labs(x = "Elevation [m. a.s.l.]", y = expression("Area [%]"), title = "") +
  theme(
    legend.position = "bottom",
    plot.title = element_text(hjust = 0.5),
    legend.title = element_blank(),
    axis.title.y = element_text(margin = margin(t = 0, r = 5, b = 0, l = 0)), # move y-axis title further away from axis labels
    panel.grid.minor = element_blank()
  ) +
  guides(fill = guide_legend(reverse = TRUE)) + # reverse order of items in legend
  scale_fill_manual(
    values = c("#AD355D", "#440A68"), # change colors of bars
    name = "",
    breaks = c("Periglacial", "Glacial"),
    labels = c("Periglacial", "Glacial")
  )

enter image description here

stefan
  • 90,330
  • 6
  • 25
  • 51