6

There are existing questions asking about labeling a single geom_abline() in ggplot2:

None of these get at a use-case where I wanted to add multiple reference lines to a scatter plot, with the intent of allowing easy categorization of points within slope ranges. Here is a reproducible example of the plot:

library(ggplot2)

set.seed(123)
df <- data.frame(
  x = runif(100, 0, 1),
  y = runif(100, 0, 1))

lines <- data.frame(
  intercept = rep(0, 5),
  slope = c(0.1, 0.25, 0.5, 1, 2))

p <- ggplot(df, aes(x = x, y = y)) +
  geom_point() +
  geom_abline(aes(intercept = intercept, slope = slope),
              linetype = "dashed", data = lines)
p

scatter plot with dashed ablines added at various slopes

As I found no way to do this programmatically via the other questions, I "scaled" the manual approach via a data frame, using trial and error to figure out reasonable label positions.

labels <- data.frame(
  x = c(rep(1, 3), 0.95, 0.47),
  y = c(0.12, 0.28, 0.53, 1, 1),
  label = lines$slope)

p + geom_text(aes(label = label), color = "red", data = labels)

plot with ablines labeled with their slope value in red text

Is there a better way than trial and error? While this wasn't too bad with 5 lines, I still had to redo my tweaking further upon export, as the plot aspect ratios and spacing were not the same between prototyping in an R session vs. the generated image. Programmatic labeling would be a huge help.

For some thoughts:

  • I wondered if the parameter could be along a range of c(0, 1), to correspond to the position along the line
  • could the min/max x/y positions be extracted from the ggplot2 object internals (which I'm not familiar with) as a "cheat" for figuring out the position? Essentially if I know the pixel location of (0, intercept), I already know the slope, so for this example, I just need to know the pixel position of max(x) or max(y), depending on where we hit the perimeter
  • this struck me as similar to ggrepel, which figures out how to label points while trying to avoid overlaps
Hendy
  • 10,182
  • 15
  • 65
  • 71
  • 1
    Wrt your 2nd point, if you set `x = Inf` or `y = -Inf`, the positions are translated to the xmax and ymin positions respectively. – teunbrand Jan 17 '22 at 20:21
  • 2
    [geomtextpath](https://github.com/AllanCameron/geomtextpath) is a fairly new package that I haven't had a chance to try out yet, but seems like it should do this – camille Jan 17 '22 at 20:45
  • @teunbrand Interesting idea, though I think I need the "real values" for my idea. I was thinking that given the start point `(0, intercept)`, I could then figure out that the label (assuming placement at the "end" of the line) would be at `(0, intercept) + (xmax, xmax * slope)`. If `Inf` is only converted on the fly, would this work? – Hendy Jan 18 '22 at 14:17

2 Answers2

4

This was a good opportunity to check out the new geomtextpath, which looks really cool. It's got a bunch of geoms to place text along different types of paths, so you can project your labels onto the lines.

However, I couldn't figure out a good way to set the hjust parameter the way you wanted: the text is aligned based on the range of the plot rather than the path the text sits along. In this case, the default hjust = 0.5 means the labels are at x = 0.5 (because the x-range is 0 to 1; different range would have a different position). You can make some adjustments but I pretty quickly had labels leaving the range of the plot. If being in or around the middle is okay, then this is an option that looks pretty nice.

library(ggplot2)
library(geomtextpath)
library(dplyr)

# identical setup from the question

p +
  geom_textabline(aes(intercept = intercept, slope = slope, label = as.character(slope)),
                  data = lines, gap = FALSE, offset = unit(0.2, "lines"), text_only = TRUE)

Alternatively, since you've already got the equations of your lines, you can do some algebra to find your coordinates. Solve for x where y is at its max, and solve for y where x is at its max; for each of those, use pmin to limit them to fit within the scope of the chart. e.g. the line with slope = 0.5 won't hit y = 1 until x = 2, which is outside the chart, so limit it to the plot's max x. How you define that max can differ: could be the maximum contained in the data, which you could also extract from the saved plot object (not sure if there are cases where these wouldn't be the same), or it could be extracted from the panel layout or breaks. Or even more ideas at How can I extract plot axes' ranges for a ggplot2 object?. That's up to you.

# y = intercept + slope * x
xmax <- max(df$x) 
# or layer_scales(p)$x$get_limits()[2] for data range
# or ggplot_build(p)$layout$panel_params[[1]]$y.range[2] for panel range
ymax <- max(df$y)
lines_calc <- lines %>%
  mutate(xcalc = pmin((ymax - intercept) / slope, xmax),
         ycalc = pmin(intercept + slope * xmax, ymax))

p +
  geom_text(aes(x = xcalc, y = ycalc, label = as.character(slope)),
            data = lines_calc, vjust = 0, nudge_y = 0.02)

camille
  • 16,432
  • 18
  • 38
  • 60
  • Wow, amazing! a) thanks for the intro to `geomtextpath` and b) Doh! I don't know why I was thinking I'd need the *pixel* equivalents of my numbers when... I flipping have *both* xmax and ymax already to do the `pmin` constraining required! This is awesome, thanks for the quick help! – Hendy Jan 18 '22 at 14:21
  • 1
    I didn't know geomtextpath was keeping out of bounds values, might be good if we'd drop them so that the `hjust` would behave less surprisingly. Thanks for mentioning :) – teunbrand Jan 18 '22 at 14:41
  • @teunbrand yeah, I dug around through the issues trying to figure out how that worked exactly and wasn't sure if this was the behavior you all wanted or not. It makes sense in some ways but is also counter to what hjust would normally imply. I could probably put together an issue if you think that would be helpful. – camille Jan 18 '22 at 14:51
  • What I was expecting was for hjust to align along the included section of the path, like based on the path rather than the limits of the plot. I suppose you could do that the same way I calculated & constrained coordinates here, but that might be tricky with more complicated paths. – camille Jan 18 '22 at 14:55
  • Indeed, I agree that this makes the most sense. Internally, the `slope = 2` runs up to the [1, 2] point, whereas we only observe the line up to the [0.5, 1]. If you want to make sure we don't forget, you can post an issue if you'd like. But I'm pretty sure I'll recall this :) – teunbrand Jan 18 '22 at 15:01
  • @teunbrand Fair enough! I appreciate the package, this was a fun first try using it – camille Jan 18 '22 at 15:39
  • 1
    I went away to write an [issue/proposal](https://github.com/AllanCameron/geomtextpath/issues/60), and didn't see all the additional comments until now. @camille I cited you (though didn't know your github handle) and reproduced your idea there with an attempt to explain how I thought the package could be improved for this use-case. – Hendy Jan 18 '22 at 18:45
  • And @teunbrand I didn't realize you were involved in that package! Hopefully I've helped document via that issue and saved some work. I'd also be happy to try figuring something out, though as mentioned have zero experience poking in R packages, only in using them. Still... let me know. – Hendy Jan 18 '22 at 18:46
  • @camille would you like to update your first example with [this implementation](https://github.com/AllanCameron/geomtextpath/issues/60#issuecomment-1015794602) of `scale_hjust_manual`? It still requires some guess-and-check, but only takes dialing in on one value vs. both x and y. I was already going to accept your answer, but I think this will make it that much more complete. – Hendy Jan 18 '22 at 20:32
  • I did try tweaking that a bit but never was satisfied with it. You should just post that as your own answer, grab a couple votes, and accept whichever one you'd like – camille Jan 18 '22 at 20:52
  • @camille done. I still see your answer as having solved this, thanks again. – Hendy Jan 18 '22 at 22:10
2

Adding an answer as either a slight improvement to camille's first answer, or "just different" depending on your perspective. This is not my own, but simply recreates that of the geomtextpath package creator in this comment.

So far, there are four total solutions:

  • my trial and error manual approach
  • camille's using geom_textabline with the default hjust=0.5 location
  • camille's finding the extremes of the plot, constraining to xmax or ymax, whichever is hit first
  • this one, which uses scale_hjust_manual to reduce the required trial and error by at least half (just need one value vs. getting both x and y right)
# same setup as in question
library(geomtextpath)

p + geom_textabline(aes(intercept = intercept, 
                        slope = slope,
                        label = as.character(slope),
                        hjust = as.character(slope)),
                    data = lines,
                    gap = FALSE,
                    text_only = TRUE,
                    offset = unit(0.2, "lines"),
                    color = "red") +
  scale_hjust_manual(values = c(0.65, 0.65, 0.65, 0.65, 0.5))

lines labeled via geom_textabline

For any who want to follow along, it sounds like there is hope for a "true solution" via geomtextpath, which will end getting hjust to "do the right thing" under the hood.

Hendy
  • 10,182
  • 15
  • 65
  • 71