45

I'm trying to figure out if it's possible to connect across missing values using geom_line. For example, in the link below there are missing values at time 3 in facet F. I'd like a line to connect time 2 and 4 in that case. Is there a way to achieve this?

https://farm8.staticflickr.com/7061/6964089563_b150e0c2a6.jpg

I have a data frame of cumulative values like so:

head(cumulative)

  individual series Time     Value
1          A      x    1 -1.008821
2          A      x    2 -2.273712
3          A      x    3 -3.430610
4          A      x    4 -4.618860
5          A      x    5 -4.893075
6          A      x    6 -5.836532

Which I'm plotting with:

ggplot(cumulative, aes(x=Time,y=Value, shape=series)) + 
    geom_point() + 
    geom_line(aes(linetype=series)) + 
    facet_wrap(~ individual, ncol=3)
Glorfindel
  • 21,988
  • 13
  • 81
  • 109
stuwest
  • 910
  • 1
  • 6
  • 14
  • I have asked a follow-up question: http://stackoverflow.com/questions/27676179/connect-points-across-selected-nas-with-geom-line – PatrickT Dec 28 '14 at 12:08

2 Answers2

69

Richie's answer is very thorough, but I wanted to show something simpler. Since lines are not drawn to NA points, another approach is drop these points when drawing lines. This implicitly makes a linear interpolation between points (as straight lines do).

Using dfr from Richie's answer, without needing the calculation of z step:

ggplot(dfr, aes(x,y)) + 
  geom_point() +
  geom_line(data=dfr[!is.na(dfr$y),])

For that matter, in this case the subsetting could be done for the whole thing.

ggplot(dfr[!is.na(dfr$y),], aes(x,y)) + 
  geom_point() +
  geom_line()
Brian Diggs
  • 57,757
  • 13
  • 166
  • 188
  • Yes! This is exactly the solution I was looking for. Now my plot command is: `ggplot(cumulative, aes(Time,Value,shape=series)) + geom_point() + geom_line(data=cumulative[!is.na(cumulative$Value),],aes(linetype=series)) + facet_wrap(~ individual, ncol=3)` And my graph comes out looking like: http://farm8.staticflickr.com/7064/6969423337_125cee3cdd_b.jpg – stuwest Mar 10 '12 at 14:25
  • What if you have more than one set of `y`? e.g. y1 = runif(10), y2 = runif(10), y3=runif(10)... and all the y's have NA's in different places. Will this still work? – Ben S. Aug 03 '16 at 04:50
  • 1
    @BenS. Then you would need to use the first version, with a separate `geom_line` call for each line, and each on containing a `data` argument which removed the `NA` entries. Typically, these sorts of graphs are better handled by `ggplot` with melted (long form) data, but that's a whole different discussion. – Brian Diggs Aug 04 '16 at 21:28
  • One can also `gather` the different lines before the plot and then filter. Something like `cumulative %>% gather("y_key", "y_val", y1:y4) %>% filter(!is.na(y_val)) %>% ggplot(aes(x, y_val, color = y_key)) + geom_line() + ... ` – Diego-MX Jan 04 '17 at 21:06
15

Lines aren't drawn if a value is NA. You need to replace these by interpolating across missing points. There are many different algorithms for interpolation, you need to experiment with several and see which one suits your data best. This example uses linear interpolation via interp1 in the pracma package.

Sample data:

dfr <- data.frame(
  x = 1:10,
  y = runif(10)
)
dfr[c(3, 6, 7), "y"] <- NA

Interpolation step:

dfr$z <- with(dfr, interp1(x, y, x, "linear"))

Compare plots:

ggplot(dfr, aes(x, y)) + geom_line()
ggplot(dfr, aes(x, z)) + geom_line()

If you are showing this graph to other people, make sure that you clearly mark the places where you've synthesised data by interpolating (maybe using dotted lines).


Update based on comment:
You can specify different aesthetics for different geoms.

ggplot(dfr, aes(x)) + 
  geom_point(aes(y = y)) +
  geom_line(aes(y = z))

To incorporate different line types for missing/non-missing y, you can do something like

ggplot(dfr, aes(x)) + 
  geom_point(aes(y = y)) +
  geom_line(aes(y = y)) +
  geom_line(aes(y = z), linetype = "dotted")
Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
  • Thanks. In this case I'm plotting the points using geom_point and then connecting them with geom_line. It sounds like I'd have to use the original dataframe to plot the points and then the dataframe with interpolated values to draw the lines. – stuwest Mar 08 '12 at 15:21