4

I'm plotting a time-series where I map the color by a factor variable. The problem I have is that the different factor levels are located in discrete time windows throughout the data so for a given factor, the end of one window is being connected by a line to the beginning of another window. This line cuts through a different factor that is being plotted between the two windows. I've changed the geom_line() to geom_point() which is okay, but I'd prefer to have the lines. Here's code to create a sample data frame.

#Create dataframe
df <- data.frame(cbind(
t= c(1361347202,1361347212,1361347222,1361347232,1361347242,1361347252,1361347262), 
y = runif(7,1,5),
l =c(1,1,1,2,2,1,1)))
df$l = as.factor(df$l)

And here's the plot command,

ggplot(df, aes(x = t, y=y, colour = factor(l)))+geom_line()

I'd like the reddish line to stop at the 3rd point and then start again at the 6th point. Also, I don't think it matters but the x-values are actually POSIX variables - I've just converted them to numeric values for this question. Thanks

tonytonov
  • 25,060
  • 16
  • 82
  • 98
Riley381
  • 67
  • 2
  • 5
  • Try [this](https://dl.dropboxusercontent.com/u/59818410/example2.jpg). – Andre Silva May 23 '14 at 19:16
  • Andre Silva - This seems to work. I'm kind of new to ggplot2. It looks like you create a dummy group variable that you group all of the data by and plot by mapping according to this variable and then color based on a secondary variable. Is this correct? – Riley381 May 27 '14 at 17:31
  • yes. You've understood correctly. – Andre Silva May 27 '14 at 18:38
  • This is not the same as the question marked dupe: in the "dupe" we have one continuous line segment, while this question is asking about actually separating the data like in the accepted answer. see https://stackoverflow.com/questions/14821064/line-break-when-no-data-in-ggplot2 – qwr Dec 30 '19 at 08:06

1 Answers1

3

You have to modify the group aesthetic for geom_path.

ind <- as.numeric(df$l[-1]) - as.numeric(df$l[-nrow(df)]) != 0
splitAt <- function(x, pos) split(x, cumsum(seq_along(x) %in% (pos+1)))
l1 <- splitAt(as.numeric(df$l), which(ind))
names(l1) <- 1:length(l1)
l2 <- lapply(seq_along(l1), 
             function(y, n, i) {
                                 as.numeric(rep(n[[i]], length(y[[i]]))) 
                               }, y=l1, n=names(l1))
ggplot(df, aes(x = t, y=y, colour = l)) + 
  geom_point() +
  geom_path(aes(group=unlist(l2)))

enter image description here

Here's a brief expalnation. First, we should find grouping indices to use them as group aes. I assume that a group consists of several consecutive red or blue points. So, ind indicates where line breaks should appear. Then, we should build a grouping variable that looks like (for your example) c(1, 1, 1, 2, 2, 3, 3), which would show what points are connected to each other. I do this in two steps: first split the variable by ind and store this in l1, then simply replace values in l1 so that ith node in the list contains only values, equal to i. The result is stored in l2 and looks like this:

[[1]]
[2] 1 1 1

[[2]]
[3] 2 2

[[3]]
[4] 3 3

Turn this into a vector by unlisting it and we're done. The difference between my answer and the one provided by @AndreSilva is how we treat the transition from one colour to another. My answer looks more difficult because I have to specify groups in an accurate fashion, and that requires some intermediate steps. Here's his plot for the same data:

enter image description here

tonytonov
  • 25,060
  • 16
  • 82
  • 98
  • I'm trying to work through this. What does `ind` represent? Also, do you see advantage/disadvantage to Andre Silva's solution vs yours. – Riley381 May 27 '14 at 17:33
  • @Riley381 Sorry, my bad. See the edit, hope this helps. – tonytonov May 28 '14 at 06:49
  • (+1). I think you've provided an alternative. It was not clear to me if the lines should be connected or not. I guessed they could because the data was a continuous time series. – Andre Silva May 28 '14 at 12:05
  • 1
    @AndreSilva Thanks to you as well. – Riley381 May 28 '14 at 19:53
  • The clean solution is to use NAs to tell ggplot where not to plot. See https://stackoverflow.com/questions/14821064/line-break-when-no-data-in-ggplot2 – qwr Dec 30 '19 at 08:20
  • However that comes at the expense of possibly introducing many many NAs. – qwr Dec 30 '19 at 08:33