1

I have what I think is a version of remove data points when using stat_summary to generate mean and confidence band or How to set multiple colours in a ggplot2 stat_summary plot? and may also relate to this bug report relating to the SE parameter https://github.com/tidyverse/ggplot2/issues/1546, but I can't seem to figure out what I am doing wrong.

I have weekly data and I am trying to plot current year, previous year, 5 year average, and 5 year range. I can get the plot and all the elements that I want, but I can't get the fill in the range to relate to my scale_fill command.

Plot sample

Here is the code I am using:

library(plyr)
require(dplyr)
require(tidyr)
library(ggplot2)
library(lubridate)
library(zoo) 
library(viridis)

  ggplot(df1,aes(week,value)) +
  geom_point(data=subset(df1,year(date)==year(Sys.Date()) ),size=1.7,aes(colour="1"))+ 
  geom_line(data=subset(df1,year(date)==year(Sys.Date()) ),size=1.7,aes(colour="1"))+ 
  geom_line(data=subset(df1,year(date)==year(Sys.Date())-1 ),size=1.7,aes(colour="2"))+
  geom_point(data=subset(df1,year(date)==year(Sys.Date())-1 ),size=1.7,aes(colour="2"))+ 
  #stat_summary(data=subset(df1,year(date)<year(Sys.Date()) &year(date)>year(Sys.Date())-6),geom = 'smooth', alpha = 0.2,size=1.7,
  #             fun.data = median_hilow,aes(colour=c("1","2","3"),fill="range"))+
  stat_summary(data=subset(df1,year(date)<year(Sys.Date()) &year(date)>year(Sys.Date())-6),geom="smooth",fun.y = mean, fun.ymin = min, fun.ymax = max,size=1.7,aes(colour="c",fill="b"))+
  #stat_summary(fun.data=mean_cl_normal, geom='smooth', color='black')+
  scale_color_viridis("",discrete=TRUE,option="C",labels=c(year(Sys.Date()), year(Sys.Date())-1,paste(year(Sys.Date())-6,"-",year(Sys.Date())-1,"\naverage",sep ="")))+
  scale_fill_viridis("",discrete=TRUE,option="C",labels=paste(year(Sys.Date())-6,"-",year(Sys.Date())-1,"\nrange",sep =""))+     
  #scale_fill_continuous()+
  scale_x_continuous(limits=c(min(df1$week),max(df1$week)),expand=c(0,0))+
  theme_minimal()+theme(
    legend.position = "bottom",
    legend.margin=margin(c(0,0,0,0),unit="cm"),
    legend.text = element_text(colour="black", size = 12),
    plot.caption = element_text(size = 14, face = "italic"),
    plot.title = element_text(face = "bold"),
    plot.subtitle = element_text(size = 14, face = "italic"),
    #panel.grid.minor = element_blank(),
    text = element_text(size = 14,face = "bold"),
    axis.text.y =element_text(size = 14,face = "bold", colour="black"),
    axis.text.x=element_text(size = 14,face = "bold", colour="black",angle=90, hjust=1),
  )+
  labs(y="Crude Oil Imports \n(Weekly, Thousands of Barrels per Day)",x="Week",
       title=paste("US Imports of Crude Oil",sep=""),
       caption="Source: EIA API, graph by Andrew Leach.")

I have placed an test.Rdata file here with the df1 data frame: https://drive.google.com/file/d/1aMt4WQaOi1vFJcMlgXFY7dzF_kjbgBiU/view?usp=sharing

Ideally, I'd like to have a fill legend item that looks like this, only with the text as I have it in my graph: enter image description here

Any help would be much appreciated.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Andrew Leach
  • 137
  • 1
  • 10

1 Answers1

5

The short answer is that you seem to be misunderstanding how ggplot's scale_xx_xx commands are meant to be used (this trips up a lot of people). Whenever possible, the intention is for the aesthetics (the aes() bit inside most geoms) to be mapped to the scale functions. For example, the following code maps year to line color:

plot.simple <- ggplot(data = df1, aes(x = week, y = value, color = as.factor(year(date)))) +
  geom_line()
print(plot.simple)

enter image description here

Since we specified that year (converted to a factor) should be used to define line color, ggplot defaults to using scale_color_hue. We could use a different scale:

plot.gray <- ggplot(data = df1, aes(x = week, y = value, color = as.factor(year(date)))) +
  geom_line() +
  scale_color_grey()
print(plot.gray)

enter image description here

If we don't want to tie aesthetics such as color or fill to values in the data, we can just specify them outside of the call to aes(). Typically you only do this if you don't have multiple values for an aesthetic:

plot.simple <- ggplot(data = df1, aes(x = week, y = value, color = as.factor(year(date)))) +
  geom_line(alpha = 0.2)
print(plot.simple)

enter image description here

But you're in the unenviable position of wanting both of these things at once. For your 2017 and 2018 lines, color is meaningful. For the summary ribbon and its associated line, color is just decorative. In such cases, I usually avoid ggplot's built-in summary functions, since they can often "help" in ways that end up confusing or cumbersome.

I would suggest creating two data sets, one containing the 2017 and 2018 years, and the other containing the summary statistics for the ribbon:

df.years <- df1 %>% 
  mutate(year = year(date)) %>% 
  filter(year >= year(Sys.Date()) - 1)

df.year.range <- df1 %>% 
  mutate(year = year(date)) %>% 
  filter(year >= year(Sys.Date()) - 6 & year <= year(Sys.Date()) - 1) %>% 
  group_by(week) %>% 
  summarize(mean = mean(value), min = min(value), max = max(value))

We can then trick ggplot into printing a nice title for the fill on the legend, by setting fill inside aes to the intended string. Because fill is set in aes(), we control its color with scale_fill_manual.

the.plot <- ggplot() +
  geom_ribbon(data = df.year.range, aes(x = week, ymin = min, ymax = max, fill = 'Previous 5 Year Range\nof Weekly Exports')) +
  geom_line(data = df.year.range, aes(x = week, y = mean), color = 'purple') +
  geom_line(data = df.years, aes(x = week, y = value, color = as.factor(year))) +
  geom_point(data = filter(df.years, year == year(Sys.Date())), aes(x = week, y = value, color = as.factor(year))) +
  scale_fill_manual(values = '#ffccff')
print(the.plot)

enter image description here

This is still rather cumbersome, because you have quite a few different elements tied to various different sources of data (lines for some years, points for others, a ribbon for a summary, etc). But it gets the job done!

jdobres
  • 11,339
  • 1
  • 17
  • 37