1

I am making plots using ggplot2 in R, and I have trouble combining smoothing and a continuous color scale. More specifically, I would like to draw a bunch of smoothed lines and for each of them, I would like to have their coloring change over the x-axis, such that they are e.g. darkest near their right endpoints. If I were to do this with pointwise linear curves (instead of smoothed lines), I would do something like

d <- data.frame(id = rep(1:100, 10), x = rep(1:10, each = 100), 
                y = rep(1:10, each = 100) + rnorm(1000),
                z = factor(rep(rep(c("a", "b"), each = 10), 500)))


ggplot(d, aes(x = x, y = y, group = id, col = x)) +
       geom_line()  

which works perfectly fine. However, if I try using a smoother rather than just connecting points, I do not get the same result: All lines simply become black using the following code:

ggplot(d, aes(x = x, y = y, group = id, col = x)) +
       geom_line(stat = "smooth", method = "loess")

Any hints as to why this happens and what can be done about it would be very much appreciated! I have seen this post, which suggests that one has to smooth the data before plotting, but I would very much like to do everything in my ggplot() call.


I have tried two things worth mentioning already. First, using the geom_smooth() function directly does not make a difference (but does change the default line-color):

ggplot(d, aes(x = x, y = y, group = id, col = x)) +
       geom_smooth(se = FALSE, method = "loess")

Secondly, col does seem to be the correct parameter to target, since when coloring is chosen according to a discrete variable, everything works:

ggplot(d, aes(x = x, y = y, group = id, col = z)) +
       geom_line(stat = "smooth", method = "loess")  
Community
  • 1
  • 1
AHP
  • 147
  • 8
  • 1
    When you are using a smoother, it is calculating a new variable, to which you can't really map values. Try precalculating the loess smoother outside and map `x` to color as you do it for `geom_line()`. – Roman Luštrik Apr 28 '17 at 11:33

1 Answers1

3

My suggestion in the other question is still the "right" way to do it. If you really don't want to modify your original dataframe, you can pipe your way through the broom package, with something like:

d %>% 
 group_by(id) %>% 
 do(augment(loess(y~x, data = .))) %>% 
 ggplot(aes(x = x, y = .fitted, group = id, colour = x)) +
 geom_line(stat = "identity", aes(colour = x))

Throughout I'm using only a subset of the data (d %>% filter(id %in% 1:10)) to make it clearer/faster: enter image description here

While this way is more "elegant", it means that you have to run the model fit every time you re-draw the figure (which also happens when you use stat_smooth() by the way). This can make performance (very) slow.

In addition, you'll notice the lines are kinky, not smooth. They're smoothed from the raw data, but the gap between each x value is too large to produce an indistinguishable curve.

The way around this is to make explicit what stat_smooth is doing: calculating a new dataframe of xs and ys from the model. To do that, you supply newdata= to augment. The side effect of this is you lose your old y (and z) values.

d %>% 
 group_by(id) %>% 
 do(augment(loess(y~x, data = .),
      newdata = data.frame(x = 0.1*(1:100)))) %>% 
 ggplot(aes(x = x, y = .fitted, group = id, colour = x)) +
 geom_line(stat = "identity", aes(colour = x))

enter image description here

The most hackish and inadvisable method is to use stat_smooth's internally calculated variables, which are mostly undocumented and subject to change without notice. Hadley Wickham explicitly discourages this.

But let's throw caution to the wind!

d %>% 
  ggplot(aes(x = x, y = y, group = id, colour = x)) +
  geom_line(stat = "smooth", method = "loess", aes(colour = ..x..))

enter image description here

Finally, of course you can put any sort of algebraic expression in for colour=. Try colour = sin(x^2/2).

enter image description here

This illustrates why this hasn't been coded in as an intentional use case. It's ugly, doesn't add information, and distracts from the actual information. So maybe stop and think long and hard about why it is you want to do this at all.

Community
  • 1
  • 1
Brian
  • 7,900
  • 1
  • 27
  • 41
  • Thank you so much for the super thorough answer! Ultimately, I want to use the method for plotting smoothed lines with different end-points and I want the lines colored according to how far along the line one is. Basically, I want to make it easier to see if something special happens just before a line stops - so there is some point to it, beyond this minimal example. – AHP May 02 '17 at 13:43
  • But, if you don't mind, I have a follow-up question: It seems to me like you feel that doing it all in the geom_line()-call is somehow "wrong". I understand that the ggplot implementation does not allow one to do this in a non-hacky way, and of course, this is a valid reason for not doing it, but are there any another problems that I am overlooking (beside, obviously, the computation time issue)? Because my next step here would otherwise be to write a geom_line2()-function myself that supports adding colors. – AHP May 02 '17 at 13:47
  • 1
    If you're willing to develop your own `geom`s you're farther along than me! I guess at that point you just have to consider whether this use case is the best way to represent your data, and what perceptual concerns there might be with such a figure. Look into the `ggraph` package which has `geom_edge_***2` for interpolating aesthetics along a line for network graphs and trees, where it's perhaps less distracting and more expected. – Brian May 02 '17 at 21:02