1

Let's say we have this following hierarcichal data on the habitats that make up my fantasy island (which is of course always warm and sunny!)

set.seed(1)

hab_dat <- data.frame(
  habitat_type = rep(c("sea", "coast", "land"), times = 1, each = 3),
  habitat_name = c("rocky", "sandy", "seaweed",
                  "beach", "pebbles", "rockpools",
                  "fields", "hills", "forest"),
  area_km2 = sample(10:40, size =9))
  
hab_dat

I want to plot the total area of each habitat type and so write following code

hab_dat %>% 
  group_by(habitat_type) %>% 
  summarise(area_km2 = sum(area_km2)) %>%
  ggplot(aes(x = habitat_type, y = area_km2, fill = habitat_type)) +
  geom_bar(stat = "identity") +
  scale_fill_manual(values = c("gold", "forestgreen", "blue"))

Looks good, but the legend is not very informative. I would like for the habitats contained within each habitat type to be included in the legend under the appropriate habitat type, just as qualitative information. Here is an example I made in paint. enter image description here

I can get a bit closer using the following code without affecting the appearance of the plot, however, I am missing the habitat_type titles and also have multiple tiles for the same colour.

hab_dat <- hab_dat %>% mutate(col = rep(c("blue", "gold", "forestgreen"), times = 1, each = 3))

pal <- setNames(as.character(hab_dat$col), as.character(hab_dat$habitat_name))

ggplot(hab_dat, aes(x = habitat_type, y = area_km2, fill = habitat_name)) +
  geom_bar(position = "stack", stat = "identity") +
  scale_fill_manual(values = pal)

enter image description here

I have been looking at solutions along the lines of this one but am trying for a more automated solution as my actual data is a bit larger than this, and also one that presents the colour tile once per group as per my drawing.

rainbird
  • 193
  • 1
  • 9

1 Answers1

2

I don't think there is an elegant solution that adresses your problem. I'll suggest here that you format the labels to imply the hierarchy.

library(ggplot2)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

set.seed(1)

hab_dat <- data.frame(
  habitat_type = rep(c("sea", "coast", "land"), times = 1, each = 3),
  habitat_name = c("rocky", "sandy", "seaweed",
                   "beach", "pebbles", "rockpools",
                   "fields", "hills", "forest"),
  area_km2 = sample(10:40, size =9))

# Format labels
labels <- split(hab_dat$habitat_name, hab_dat$habitat_type)
labels <- unlist(Map(function(top, bottom) {
  paste0(top, "\n", paste("- ", bottom, collapse = "\n"))
}, top = names(labels), bottom = labels))


hab_dat %>% 
  group_by(habitat_type) %>% 
  summarise(area_km2 = sum(area_km2)) %>%
  ggplot(aes(x = habitat_type, y = area_km2, fill = habitat_type)) +
  geom_bar(stat = "identity") +
  scale_fill_manual(
    values = c("gold", "forestgreen", "blue"),
    labels = function(i) {labels[i]} # Lookup label
  )

Created on 2022-07-19 by the reprex package (v2.0.1)

teunbrand
  • 33,645
  • 4
  • 37
  • 63
  • Thanks for this - it definitely a step closer. – rainbird Jul 19 '22 at 13:56
  • @rainbird can you explain why it is only a step closer and not a complete solution? It's difficult to see what needs to be improved in this answer to meet your specification. – Allan Cameron Jul 19 '22 at 14:00
  • I am trying to arrange it so that the colour tile is only next to only the habitat type. This may seem like I am being picky, but in a version of the plot with different numbers of sub categories the colour tiles in the legends appear as different lengths. – rainbird Jul 19 '22 at 14:13
  • Thanks again for this @teunbrand. Accepted as the closest answer possible and works great. – rainbird Jul 21 '22 at 10:12