22

I'm trying to create several graphs using ggplot. The graphs are a series of bar graphs that together describe a line as well EXAMPLE (BTW, yes I realize the color palette is ugly, it's color-blind friendly which is important for my audience)

My issue is that I need to make several of these graphs and I want the colors to stay consistent across all of them. Since the "Type" variable comes up in different orders across the several datasets I'm going to be using, I need to manually set a color for each type. I thought that this question : How to manually fill colors in a ggplot2 histogram would have the answer, but when I try that, it changes the names in the legend to the hex definition of the color, but the colors themselves go back to ggplot's default palette.

Here's the code I have so far:

  cbbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")

 ggplot()+
    scale_fill_manual(values=cbbPalette)+
    geom_bar(data=subset(eten, Type=="Waste Wood"), aes(x=Tprod, y=acost, fill=cbbPalette[1], width=MGGEY+25), stat="identity")+
    geom_bar(data=subset(eten, Type=="Agricultural Residue"), aes(x=Tprod, y=acost, fill=cbbPalette[2], width=MGGEY+25), stat="identity")+
    geom_bar(data=subset(eten, Type=="Forest Residue"), aes(x=Tprod, y=acost, fill=cbbPalette[3], width=MGGEY+25), stat="identity")+
    geom_bar(data=subset(eten, Type=="Herbaceous Energy Crop"), aes(x=Tprod, y=acost, fill=cbbPalette[4], width=MGGEY+25), stat="identity")+
    geom_bar(data=subset(eten, Type=="MSW"), aes(x=Tprod, y=acost, fill=cbbPalette[5], width=MGGEY+25), stat="identity")+
    scale_y_continuous("Average Cost", labels = dollar, expand=c(0,0))+
    scale_x_continuous("Million Gallons of Gasoline Equivalent", expand=c(0,0))+
    theme(legend.position="bottom", panel.background=element_rect(colour = NA, fill = "white"), axis.line=element_line(), panel.grid.major.y=element_line(colour="black"), panel.grid.minor=element_blank())

My level of R expertise is fairly low, so I may be missing something simple, but I can't get it to work on my own. Thanks in advance for the help.

Update: I inadvertently pasted an incorrect version of my code, the "fill" commands are back to my best guess. An example dataset is here.

Community
  • 1
  • 1
scianalysis
  • 221
  • 1
  • 2
  • 4
  • Can you provide the dataset eten please or at least a reproducible example http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?? Also, you don't have to subset for each type – Jonas Tundo Aug 14 '13 at 11:05

2 Answers2

23

I'm guessing that you've looked at the ggplot color blind example shown here? Without your data, I can only speculate that your geom_bar calls create ambiguity regarding which layer to apply the fill changes to since your initial call to ggplot doesn't have an aes argument. Try moving all of your data into a single dataframe and reference it in the initial call to ggplot, e.g.,

ggplot(df, aes(x=cond, y=yval)) +
    geom_bar() + 
    scale_fill_manual(values=cbbPalette)

where df is the dataframe containing your data and aes is the mapping between your variables. This makes it clear to ggplot that you want the fill colors of geom_bar to correspond to the data in df. There are ways to make this work with your current code, but they're unconventional for creating standard bar plots.

Jay B. Martin
  • 499
  • 4
  • 11
  • That answer works to create the sample graph I showed in the image I linked to, but it doesn't help with the real problem. I've got 5-6 similar datasets I need to produce these graphs for and I want to make sure that the color for each type stays the same across each graph. So I'd like to manually assign one color to "Waste Wood" for example and another to "MSW", so that each time I reproduce the graphs, the colors are the same. The example you provided seems to have ggplot pick which colors to associate with each type every time the script is run. – scianalysis Aug 14 '13 at 19:37
  • 1
    Ah, I see. Are you're having a problem with factors? R has this annoying habit of reordering variables in a manner that's ostensibly whimsical. You can force ggplot to order your variables any way you'd like. See [here](http://kohske.wordpress.com/2010/12/29/faq-how-to-order-the-factor-variables-in-ggplot2/). Using those tricks, once you place each of your labels in the same order, each of your plots will have the same color fill order. – Jay B. Martin Aug 14 '13 at 19:50
  • 1
    That didn't work either. It definitely re-ordered the factors in the data frame, because the relationship between Type and the fill color changed, but not every type is represented in each of the datasets I'm working with. What I really need is a way to say: If Type="Y" then fill=cbbPalette[x] – scianalysis Aug 17 '13 at 22:19
  • Place your data sets into one aggregate dataframe, and then add a type column (for grouping and plotting by type later). Since your variables will be in the same columns, ggplot will map your factors to the same colors across types (even if a factor is missing in some types). Finally, create separate type plots using facet, e.g., `ggplot(df, aes(x=cond, y=yval)) + geom_bar() + scale_fill_manual(values=cbbPalette) + facet_grid(. ~ type)`. – Jay B. Martin Aug 18 '13 at 01:58
10

The answer of Jay B. Martin doesn't fully answer the question. So although this question is quite old, here is a solution for future reference. We make some data for a reproducible example:

color_table <- tibble(
  Land_cover = c("Agriculture", "Forest", "Ocean", "Lake", "Populated"),
  Color = c("yellow", "darkgreen", "blue4", "lightblue", "maroon3")
  )

df <- data.frame(
  Region = c(rep(1,5), rep(2,5)),
  Area_no = c(1,2,3,4,5,1,2,3,4,5),
  Land_cover = c("Agriculture", "Forest", "Agriculture", "Agriculture", "Lake", 
                 "Lake", "Populated", "Populated", "Ocean", "Populated"), 
  Square_km = c(10,15,7,12,3, 5,30,20,40,10)
  )

So, we want to use df to make a graph for each Region, where Land_cover is represented by the correct color given by color_table. First, we must make sure that the Land_cover variable in the data set df is a a factor variable in the same order as the colors we want to put on each type of land cover. We do that by using the order from color_table:

df$Land_cover <- factor(df$Land_cover, levels = color_table$Land_cover)

Now, the by far simplest way to plot using the correct colors is, as Jay B. Martin suggests in the comments, to use facet_grid() or facet_wrap():

ggplot(df, aes(x = Area_no, y = Square_km, fill = Land_cover)) +
  geom_col() +
  scale_fill_manual(values = color_table$Color) +
  facet_grid(.~Region) 

ggplot using facet But what if you want to make a separate plot for each Region? For instance, you want to save each plot as a separate file.

The problem

If we basically make a small loop where we select a subset of the data and reuse the code we used above (except facet_grid), we clearly get the wrong colours (shown here for Region 2):

for (region in 1:2){
  gg <- ggplot(subset(df, Region %in% region), aes(x = Area_no, y = Square_km, fill = 
  Land_cover)) +
    geom_col() + 
    scale_fill_manual(values = color_table$Color) 
  ggsave(paste0("Areas_region_", region, ".png"), width = 5, height = 3)
  }

Plot with wrong colours

There are two ways to get the correct colours:

Solution 1. drop = FALSE (legend shows all categories)

Adding drop = FALSE inside scale_fill_manual is by far the simplest. You will then get the corrcet colours, and the legend will show all possible categories, not only those that are in the plot:

for (region in 1:2){
  gg <- ggplot(subset(df, Region %in% region), aes(x = Area_no, y = Square_km, fill = 
  Land_cover)) +
    geom_col() + 
    scale_fill_manual(values = color_table$Color, drop = FALSE) 
  ggsave(paste0("Areas_region_", region, ".png"), width = 5, height = 3)
  }

Plot with correct colours and legend for all categories

Solution 2. Pick colors for each plot (legend shows only the categories shown in plot)

If for some reason you don't want the legend to show all possible categories (for instance if there is a huge number of them), you need to pick the correct colors for each plot:

library(magrittr)
for (region in 1:2){
  df_plot <- subset(df, Region %in% region)
  actual_cover <- df_plot$Land_cover %>% as.numeric() %>% table() %>% names() %>% as.numeric()
  gg <- ggplot(df_plot, aes(x = Area_no, y = Square_km, fill = Land_cover)) +
    geom_col() + 
    scale_fill_manual(values = color_table$Color[actual_cover])
  ggsave(paste0("Areas_region_", region, "ver3.png"), width = 5, height = 3)
  }

which results in the following plot (for Region 2): Plot with correct colours and legend for all categories

What we actually do here is to make a vector actual_cover which contains which colours (number 1-6) that are actually used in the current plot. As a result, the legend contains only the categories present in the plot, while the colours are still correct.

Dag Hjermann
  • 1,960
  • 14
  • 18