3

I am creating a number of plots using ggplot2 in R and want a way to standardize implementation of a cutoff line. I have data on a number of different measures for four cities over a ~10 year time period. I've plotted them as line graphs with each city a different color within a given graph. I will be creating a plot for each of the different measures I have (around 20).

On each of these graphs, I need to put two cutoff lines (with a word next to them) representing implementation of some policy so that people reading the graphs can easily identify the difference between performance before and after the implementation. Below is approximately the code I'm currently using.

gg_plot1<- ggplot(data=ggdata, aes(x=Year, y=measure1, group=Area, color=Area)) +
  geom_vline(xintercept=2011, color="#EE0000") +
  geom_text(aes(x=2011, label="City1\n", y=0.855), color="#EE0000", angle=90, hjust=0, family="serif") +
  geom_vline(xintercept=2007, color="#000099") +
  geom_text(aes(x=2007, label="City2", y=0.855), color="#000099", angle=0, hjust=1, family="serif") +
  geom_line(size=.75) +
  geom_point(size=1.5) +
  scale_y_continuous(breaks=round(seq(min(ggdata$measure1, na.rm=T), max(ggdata$measure1, na.rm=T), by=0.01), 2)) +
  scale_x_continuous(breaks=min(ggdata$Year):max(ggdata$Year)) +
  scale_color_manual(values=c("#EE0000", "#00DDFF", "#009900", "#000099")) +
  theme(axis.text.x = element_text(angle=90, vjust=1),
        panel.background = element_rect(fill="white", color="white"),
        panel.grid.major = element_line(color="grey95"),
        text = element_text(size=11, family="serif"))

The problem with this implementation is that it relies on placing the two geom_text() on a particular place on the specific graph. These different measures all have different ranges so in order to do this I'd need to go plot by plot and find a spot to place them. What I'd prefer to do is something like force the range of each plot down by X% and put the geom_text() aligned to the bottom of the range. The lines shouldn't need adjusting (same year in every plot), just the position of the text. I found some similar questions here but none that had to do with the specific problem of placing something in the same position on different graphs with different ranges.

Is there a way to do what I'm looking for? If I had to guess, it'd something like using relative positioning rather than absolute but I haven't been able to find away to do that within ggplot. For the record, I'm aware the two geom_text()s are oriented differently. I did that to compare which we prefered but left it for you all. We will ultimately be going with the one that has the text rotated 90deg. Additionally, some of these will be faceted together so that might provide an extra layer of difficulty. Haven't gotten to that point yet.

Sidebar: an alternative way to visualize this would be to change the line from solid to dotted at the cutoff year. Is this possible? I'm not sure the client would want that but I'd love to present it as an option if anyone can point me in the direction of where to learn about how to do that.

Edit to add:

Sample data which shows what happens when running it with different y-ranges

ggdata <- data.frame(Area=rep(c("City1", "City2", "City3", "City4"), times=7),
                     Year=c(rep(2006,4), rep(2007,4), rep(2008,4), rep(2009,4), rep(2010,4), rep(2011,4), rep(2012,4)),
                     measure1=rnorm(28,10,2),
                     measure2=rnorm(28,50,10))

Sample plot which has the geom_text()s in the proper position, but this was done using the code above with a fixed position within the plot. When I replicate the code using a different measure that has a differnet y-range it ends up stretching the plot window. Sample plot

cparmstrong
  • 799
  • 6
  • 23
  • Please post example data and wanted output example. – pogibas Sep 22 '17 at 18:37
  • 1
    Just updated it – cparmstrong Sep 22 '17 at 19:00
  • Your data frame code produces an error and the plot code would produce an error, even if the data frame error was resolved. Please update the example so that the code works. – eipi10 Sep 22 '17 at 19:09
  • Sorry about that, just updated again. The embedded image shows the code when it runs with the original data. The sample data runs with the sample code and shows the problem I'm trying to fix. – cparmstrong Sep 22 '17 at 19:20
  • try: `geom_text(aes(x=2011, label="City1\n", y=min(measure1)*1.1)` – missuse Sep 22 '17 at 19:26
  • FYI, to shorten the data frame code: `Year=rep(2006:2012,each=4)` and `Area=rep(paste0("City",1:4), 7)` – eipi10 Sep 22 '17 at 19:48

1 Answers1

3

You can use the y-range of the data to position to the text labels. I've set the y-limits explicitly in the example below, but that's not absolutely necessary unless you want to change them from the defaults. You can also adjust the x-position of the text labels using the x-range of the data. The code below will position the labels at the bottom of the plot, regardless of the y-range of the data.

I've also switched from geom_text to annotate. geom_text overplots the text labels multiple times, once for each row in the data. annotate plots the label once.

ypos = min(ggdata$measure1) + 0.005*diff(range(ggdata$measure1))
xv = 0.02
xh = 0.01
xadj = diff(range(ggdata$Year))

ggplot(data=ggdata, aes(x=Year, y=measure1, group=Area, color=Area)) +
  geom_vline(xintercept=2011, color="#EE0000") +
  geom_vline(xintercept=2007, color="#000099") +
  geom_line(size=.75) +
  geom_point(size=1.5) +
  annotate(geom="text", x=2011 - xv*xadj, label="City1", y=ypos, color="#EE0000", angle=90, hjust=0, family="serif") +
  annotate(geom="text", x=2007 - xh*xadj, label="City2", y=ypos, color="#000099", angle=0, hjust=1, family="serif") +
  scale_y_continuous(limits=range(ggdata$measure1),
                     breaks=round(seq(min(ggdata$measure1, na.rm=T), max(ggdata$measure1, na.rm=T), by=1), 0)) +
  scale_x_continuous(breaks=min(ggdata$Year):max(ggdata$Year)) +
  scale_color_manual(values=c("#EE0000", "#00DDFF", "#009900", "#000099")) +
  theme(axis.text.x = element_text(angle=90, vjust=1),
        panel.background = element_rect(fill="white", color="white"),
        panel.grid.major = element_line(color="grey95"),
        text = element_text(size=11, family="serif"))

enter image description here

UPDATE: To respond to your comment, here's how you can create a separate plot for each "measure" column in your data frame.

First, we create reproducible data with three measure columns:

library(ggplot2)
library(gridExtra)
library(scales)

set.seed(4)
ggdata <- data.frame(Year=rep(2006:2012,each=4),
                     Area=rep(paste0("City",1:4), 7),
                     measure1=rnorm(28,10,2),
                     measure2=rnorm(28,50,10),
                     measure3=rnorm(28,-50,5))

Now, we take the code from above and package it in a function. The function take an argument called measure_var. This is the data column, provided as a character_string, that will provide the y-values for the plot. Note that we now use aes_string instead of aes inside ggplot.

plot_func = function(measure_var) {

  ypos = min(ggdata[ , measure_var]) + 0.005*diff(range(ggdata[ , measure_var]))
  xv = 0.02
  xh = 0.01
  xadj = diff(range(ggdata$Year))

  ggplot(data=ggdata, aes_string(x="Year", y=measure_var, group="Area", color="Area")) +
    geom_vline(xintercept=2011, color="#EE0000") +
    geom_vline(xintercept=2007, color="#000099") +
    geom_line(size=.75) +
    geom_point(size=1.5) +
    annotate(geom="text", x=2011 - xv*xadj, label="City1", y=ypos, 
             color="#EE0000", angle=90, hjust=0, family="serif") +
    annotate(geom="text", x=2007 - xh*xadj, label="City2", y=ypos, 
             color="#000099", angle=0, hjust=1, family="serif") +
    scale_y_continuous(limits=range(ggdata[ , measure_var]),
                       breaks=pretty_breaks(5)) +
    scale_x_continuous(breaks=min(ggdata$Year):max(ggdata$Year)) +
    scale_color_manual(values=c("#EE0000", "#00DDFF", "#009900", "#000099")) +
    theme(axis.text.x = element_text(angle=90, vjust=1),
          panel.background = element_rect(fill="white", color="white"),
          panel.grid.major = element_line(color="grey95"),
          text = element_text(size=11, family="serif")) +
    ggtitle(paste("Plot of", measure_var))
}

We can now run the function once like this: plot_func("measure1"). However, let's run it on all the measure columns in one go by using lapply. We give lapply a vector with the names of the measure columns (names(ggdata)[grepl("measure", names(ggdata))]), and it runs plot_func on each of these columns in turn, storing the resulting plots in the list plot_list.

plot_list = lapply(names(ggdata)[grepl("measure", names(ggdata))], plot_func)

Now if we wish, we can lay them all out together using grid.arrange. In this case, we only need one legend, rather than a separate legend for each plot, so we extract the legend as a separate graphical object and lay it out beside the three plots.

# Function to get legend from a ggplot as a separate graphical object
# Source: https://github.com/tidyverse/ggplot2/wiki/Share-a-legend-between-two-ggplot2-graphs/047381b48b0f0ef51a174286a595817f01a0dfad
g_legend<-function(a.gplot){
  tmp <- ggplot_gtable(ggplot_build(a.gplot))
  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
  legend <- tmp$grobs[[leg]]
  return(legend)
}

# Get legend
leg = g_legend(plot_list[[1]])

# Lay out all of the plots together with a single legend
grid.arrange(arrangeGrob(grobs=lapply(plot_list, function(x) x + guides(colour=FALSE))),
             leg,
             ncol=2, widths=c(10,1))

enter image description here

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • I think this is going to work perfectly. Tried it out with two different measures and it did so far, now to just replicate 20 more times! Thanks for your help, especially the tip about `geom_text()` versus `annotate(geom())`. I had been wondering why the text looked so ugly. – cparmstrong Sep 22 '17 at 19:37
  • If you want to do this over and over, package it into a function that takes a data frame. Then run the function for all of your data frames. If you load the data frames into a list, this will be even easier: `lapply(data_frame_list, my_plot_function)`. – eipi10 Sep 22 '17 at 19:38
  • I've never wrote a proper function before but I'll do some searching and give it a shot. Do you know of a particular tutorial to help? – cparmstrong Sep 22 '17 at 20:00
  • Check out the [R for Data Science](http://r4ds.had.co.nz/), by Hadley Wickham (author of ggplot2 and several other widely used packages). It's freely available on the web and also can be purchased in hardcopy. Chapter 19 discusses functions. – eipi10 Sep 22 '17 at 20:03
  • All the data is in one data.frame with the two reference vars (Area, Year) and about 20 columns of differnt measures. I tried `plot_graphs <- function(x) { code_from_above_with_cols_replaced_with_x }` and then `lapply(ggdata$measure1, plot_graphs)` but am getting error messages that seem to center around one problem "Unknown or uninitialised column: 'measure'." Am I on the right track? – cparmstrong Sep 22 '17 at 20:27