0

I'm working on recreating some plots I used to do with Numbers (Mac) in R for efficiency and automation purposes.

I would like to achieve something like the following:

enter image description here

With some help I was able to set the distance between the legend and plot itself and also wrote a function to plot (see this post for more information).

Now I am stuck with the trendlines and am facing the following problem: How can I force geom_smooth() to start and end with the bar of the group the trendline belongs to? Default seems to be, that all trendlines start and end in the middle of the groups.

My current plot looks like this: enter image description here

And here is a MWE to reproduce the main/relevant parts of the plot:

library(tidyverse)

set.seed(9)

data <- data.frame(Cat=c(rep("A",times=5),rep("B",times=5),rep("C", times=5)),
                   year=rep(c(2015,2016,2017,2018,2019),times=3),
count=c(sample(seq(60,80),replace=TRUE,size=5),sample(seq(100,140),replace=TRUE,size=5),sample(seq(20,30),replace=TRUE,size=5)))

legend_labels <- c("A","B","C")

ggplot(data=data,aes(x=year, y=count, fill=Cat)) +
  geom_col(position="dodge",show.legend=TRUE) + 
  geom_smooth(aes(color=Cat),method="lm", formula=y~x,se=FALSE,show.legend=TRUE,linetype="dashed") +
  scale_fill_manual(values = c("#CF232B","#942192","#000000"),labels=legend_labels) +
  scale_color_manual(values= c("#CF232B","#942192","#000000"),labels=paste("Trend ",legend_labels,sep="")) +
  theme(panel.background = element_rect(fill="white"),
    legend.title = element_blank(),
    legend.key = element_rect(fill="white")) +
  guides(color=guide_legend(override.aes=list(fill=NA)),fill=guide_legend(override.aes = list(linetype=0)))

Note: Since I don't think they're relevant to the problem a lot of adjustment and user functions are not represented in this MWE.

Peter
  • 11,500
  • 5
  • 21
  • 31
thuettel
  • 165
  • 1
  • 11

1 Answers1

1

You could use layer_data to extract the coordinates of each bar, calculate the lm and draw the lines with draw_segment:

library(purrr)
library(dplyr)

p <- ggplot(data=data,aes(x=year, y=count, fill=Cat)) +
  geom_col(position="dodge",show.legend=TRUE) + 
  scale_fill_manual(values = c("#CF232B","#942192","#000000"),labels=legend_labels) +
  scale_color_manual(values= c("#CF232B","#942192","#000000"),labels=paste("Trend ",legend_labels,sep="")) +
  theme(panel.background = element_rect(fill="white"),
        legend.title = element_blank(),
        legend.key = element_rect(fill="white")) +
  guides(color=guide_legend(override.aes=list(fill=NA)),fill=guide_legend(override.aes = list(linetype=0)))



data_layer <- layer_data(p)

segment <- function(data){
  model <- lm(y~x,data)
  x <- min(data$x)
  xend <- max(data$x) 
  y <- predict(model, data.frame(x=c(xmin,xmax)))
  yend <- y[2]
  y <- y[1]
  data.frame(x=x,xend=xend,y=y,yend=yend,color=data$fill[1])
}

segments <- data_layer %>% split(.$fill) %>% map_dfr(~segment(.x))
CatLookup <- setNames(c("A","B","C"), c("#CF232B","#942192","#000000"))

segments <- segments %>% mutate(Cat = coalesce(CatLookup[color], color))

p + geom_segment(data=segments, aes(x=x,xend=xend,y=y,yend=yend,color=Cat),linetype=2)

enter image description here

Waldi
  • 39,242
  • 6
  • 30
  • 78
  • Thank you & sorry that I get back to you that late, but I finally managed to run your code. 1. I think it needs to be `y <- predict(model, data.frame(x=c(x,xend))` instead `x=c(xmin,xmax)`. At least I encounter an error otherwise. 2. I don't really understand how `map_dfr(~segment(.x))` works. I need to look into that further. 3. As I understand `geom_segment()` this will only work for straigth lines. I will have other graphs were formulars like `y~poly(x,3)` will be used for trendlines. Because of (3) this is a nice work around for the MWE, but unfortunately won't solve my general problem. – thuettel May 05 '21 at 07:28
  • I am quite surprised, that no one ever ran into that problem or at least I could not find other posts/threads relating to that. I find that behaviour of `geom_smooth` rather strange. Is there any logical explanation why it would be good to start from the middle of the group rather than from the categories within the groups? – thuettel May 05 '21 at 07:29
  • I agree, this is quite hacky, but your requirement doesn't seem to fit in ggplot's general philosophy. `c(xmin,xmax)` is to predic the two extremities of the segment, which are used to draw the line. We could generate a dataframe for a more complex `geom_line`. I'll try to put an example of this if I find some time. – Waldi May 05 '21 at 07:35
  • Would it be possible to extract the line from `geom_smooth()` and just horizontally move it so it lines up with the refering bars in the first and last group? That's how I would do it in case of image manipulation. – thuettel May 05 '21 at 08:07