0

I've got a question regarding an edge case with ggplot2 in R.

They don't like you adding multiple legends, but I think this is a valid use case.

I've got a large economic dataset with the following variables.

year = year of observation
input_type = *labor* or *supply chain*
input_desc = specific type of labor (eg. plumbers OR building supplies respectively)
value = percentage of industry spending

And I'm building an area chart over approximately 15 years. There are 39 different input descriptions and so I'd like the user to see the two major components (internal employee spending OR outsourcing/supply spending)in two major color brackets (say green and blue), but ggplot won't let me group my colors in that way.

Here are a few things I tried.

Junk code to reproduce

spec_trend_pie<- data.frame("year"=c(2006,2006,2006,2006,2007,2007,2007,2007,2008,2008,2008,2008),
           "input_type" = c("labor", "labor", "supply", "supply", "labor", "labor","supply","supply","labor","labor","supply","supply"),
           "input_desc" = c("plumber" ,"manager", "pipe", "truck", "plumber" ,"manager", "pipe", "truck", "plumber" ,"manager", "pipe", "truck"), 
           "value" = c(1,2,3,4,4,3,2,1,1,2,3,4))
spec_broad <- ggplot(data = spec_trend_pie, aes(y = value, x = year, group = input_type, fill = input_desc)) + geom_area()

Which gave me

Error in f(...) : Aesthetics can not vary with a ribbon

And then I tried this

sff4 <- ggplot() + 
  geom_area(data=subset(spec_trend_pie, input_type="labor"), aes(y=value, x=variable, group=input_type, fill= input_desc)) +
  geom_area(data=subset(spec_trend_pie, input_type="supply_chain"), aes(y=value, x=variable, group=input_type, fill= input_desc)) 

Which gave me this image...so closer...but not quite there. enter image description here

To give you an idea of what is desired, here's an example of something I was able to do in GoogleSheets a long time ago. enter image description here

  • You could manipulate the underlaying factor levels of `input_desc` such that first all factors of group1 appear and then group2. Then you can specify colours values in `scale_fill_manual(values = c("blue1", "blue2", ..., "grey20","grey30", ..))` so that the colours match the groups. – teunbrand May 15 '19 at 19:27
  • Was just about to make the same suggestion as @teunbrand. I'd create gradients with an outside tool like [this one](http://gka.github.io/palettes/#colors=lightyellow,orange,deeppink,darkred|steps=7|bez=1|coL=1). But I'd also caution you that it will be pretty much impossible to distinguish 39 colors, particularly shades of just a few hues. If at all possible, you might want to collapse groups together or experiment with something like small multiples. – camille May 15 '19 at 19:31
  • Not directly about the code itself, but I recognize these categories from the US census occupation codes. When I work with these, I generally use just the top level in the hierarchy (e.g. sales and office occupations), except for management, business, science, and art, where I use the 2nd level. That lets me minimize the number of distinct categories, particularly ones with relatively small values – camille May 15 '19 at 19:36
  • @teunbrand so I would just manually dictate the order as a pre-step? I suppose that makes sense. Thanks! – mccinthenyc May 15 '19 at 20:46
  • @camille thanks for the tool recommendation. I don't necessarily need everyone to see the different fields, but that's a good point re: mapping the colors back to the legend. I think I'll end up manually pointing and labeling the significant ribbons. – mccinthenyc May 15 '19 at 20:47
  • Going to wait and see if someone else posts a suggestion until I give that a shot, but it's a great backup option. – mccinthenyc May 15 '19 at 20:48
  • Keep in mind that without a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), folks can't get much more specific – camille May 15 '19 at 20:52
  • Vary color by `input_code` and alpha by `input_desc`? – Axeman May 15 '19 at 21:29
  • @camille added some code to reproduce – mccinthenyc May 16 '19 at 00:15

1 Answers1

0

It's a bit of a hack but forcats might help you out. I did a similar post earlier this week:

How to factor sub group by category?


First some base data

set.seed(123)
raw_data <-
  tibble(
    x = rep(1:20, each = 6),
    rand = sample(1:120, 120) * (x/20), 
    group = rep(letters[1:6], times = 20),
    cat = ifelse(group %in% letters[1:3], "group 1", "group 2")
  ) %>% 
  group_by(group) %>% 
  mutate(y = cumsum(rand)) %>% 
  ungroup() 

Now, use factor levels to create gradients within colors

df <-
  raw_data %>% 
  # create factors for group and category
  mutate(
    group = fct_reorder(group, y, max),
    cat = fct_reorder(cat, y, max) # ordering in the stack
  ) %>% 
  arrange(cat, group) %>% 
  mutate(
    group = fct_inorder(group), # takes the category into account first
    group_fct = as.integer(group), # factor as integer
    hue = as.integer(cat)*(360/n_distinct(cat)), # base hue values
    light_base = 1-(group_fct)/(n_distinct(group)+2), # trust me
    light = floor(light_base * 100) # new L value for hcl()
  ) %>% 
  mutate(hex = hcl(h = hue, l = light))

Create a lookup table for scale_fill_manual()

area_colors <-
  df %>% 
  distinct(group, hex)

Lastly, make your plot

ggplot(df, aes(x, y, fill = group)) +
  geom_area(position = "stack") +
  scale_fill_manual(
    values = area_colors$hex,
    labels = area_colors$group
  )

enter image description here

yake84
  • 3,004
  • 2
  • 19
  • 35