39

I am using R to plot some data.

Date <- c("07/12/2012 05:00:00", "07/12/2012 06:00:00", "07/12/2012 07:00:00",
      "07/12/2012 08:00:00","07/12/2012 10:00:00","07/12/2012 11:00:00")
Date <- strptime(Date, "%d/%m/%Y %H:%M")
Counts <- c("0","3","10","6","5","4")
Counts <- as.numeric(Counts)
df1 <- data.frame(Date,Counts,stringsAsFactors = FALSE)
library(ggplot2)
g = ggplot(df1, aes(x=Date, y=Counts)) + geom_line(aes(group = 1))
g

How do I ask R not to plot data as a continuous line when there is a break in time? I normally have a data point every hour, but sometimes there is a break (between 8 am and 10 am). Between these points, I don't want the line to connect. Is this possible in R?

Edit

Many thanks for the responses here. My data is now in 10 second intervals, and I wish to do the same piece of analysis using this data.

df <- structure(list(Date = c("11/12/2012", "11/12/2012", "11/12/2012", 
                     "11/12/2012", "11/12/2012", "11/12/2012", "11/12/2012", 
                     "11/12/2012", "11/12/2012", "11/12/2012", "11/12/2012"),
                     Time = c("20:16:00", "20:16:10", "20:16:20", "20:16:30", 
                     "20:16:40", "20:16:50", "20:43:30", "20:43:40", 
                     "20:43:50", "20:44:00", "20:44:10"),
                     Axis1 = c(181L, 14L, 65L, 79L, 137L, 104L, 7L, 0L, 0L, 
                     14L, 0L),
                     Steps = c(13L, 1L, 6L, 3L, 8L, 4L, 1L, 0L, 0L, 0L, 0L)),
                .Names = c("Date", "Time", "Axis1", "Steps"),
                row.names = c(57337L, 57338L, 57339L, 57340L, 57341L, 57342L, 
                57502L, 57503L, 57504L, 57505L, 57506L), class = "data.frame")

I think I understand what the code is trying to do, when it adds the column 'group' to the original dataframe, but my question surrounds how I get R to know the data is now in 10 second intervals? When I apply the first line of code to determine whether the numbers are continuous or whether there is a gap (e.g. idx <- c(1, diff(df$Time)), I get the following error:

Error in r[i1] - r[-length(r):-(length(r) - lag + 1L)] : 
  non-numeric argument to binary operator

After my Time variable, do I need to add as.POSIXct to ensure it recognises the time correctly?

zx8754
  • 52,746
  • 12
  • 114
  • 209
KT_1
  • 8,194
  • 15
  • 56
  • 68

3 Answers3

27

You'll have to set group by setting a common value to those points you'd like to be connected. Here, you can set the first 4 values to say 1 and the last 2 to 2. And keep them as factors. That is,

df1$grp <- factor(rep(1:2, c(4,2)))
g <- ggplot(df1, aes(x=Date, y=Counts)) + geom_line(aes(group = grp)) + 
                     geom_point()

Edit: Once you have your data.frame loaded, you can use this code to automatically generate the grp column:

idx <- c(1, diff(df$Date))
i2 <- c(1,which(idx != 1), nrow(df)+1)
df1$grp <- rep(1:length(diff(i2)), diff(i2))

Note: It is important to add geom_point() as well because if the discontinuous range happens to be the LAST entry in the data.frame, it won't be plotted (as there are not 2 points to connect the line). In this case, geom_point() will plot it.

As an example, I'll generate a data with more gaps:

# get a test data
set.seed(1234)
df <- data.frame(Date=seq(as.POSIXct("05:00", format="%H:%M"), 
                as.POSIXct("23:00", format="%H:%M"), by="hours"))
df$Counts <- sample(19)
df <- df[-c(4,7,17,18),]

# generate the groups automatically and plot
idx <- c(1, diff(df$Date))
i2 <- c(1,which(idx != 1), nrow(df)+1)
df$grp <- rep(1:length(diff(i2)), diff(i2))
g <- ggplot(df, aes(x=Date, y=Counts)) + geom_line(aes(group = grp)) + 
            geom_point()
g

ggplot2_groups

Edit: For your NEW data (assuming it is df),

df$t <- strptime(paste(df$Date, df$Time), format="%d/%m/%Y %H:%M:%S")

idx <- c(10, diff(df$t))
i2 <- c(1,which(idx != 10), nrow(df)+1)
df$grp <- rep(1:length(diff(i2)), diff(i2))

now plot with aes(x=t, ...).

Arun
  • 116,683
  • 26
  • 284
  • 387
  • (+1) however, in this case, its more like the OP expects missing values in his data, isn't it ? :-) – juba Feb 11 '13 at 21:35
  • Many thanks. Is there a way of doing this automatically without looking at the individual data files (as I have > 1000 files to run in this way, and I won't probably be able to look at them one by one?). And @Juba - yes, I would expect zeros. In my real data, if there is 20 minutes of continuous zeros, these are deleted. – KT_1 Feb 11 '13 at 21:36
  • Yes, as long as you know that the interval is always 1 hour, we can do this. Give me a minute, I'll edit the post. – Arun Feb 11 '13 at 21:38
  • 1
    @Arun Ok, ok, I'll surrender :) And great edit, by the way. Too bad I can't upvote you twice ! – juba Feb 11 '13 at 22:06
  • :) thanks juba. no issues. @KT_1, of course here I assume that the all continuous values are 1 hour apart. Anything more than 1 hour apart will be given another group (until the next entry where I find >1hour difference). – Arun Feb 11 '13 at 22:08
  • @KT_1, please check my `note` as well. In short, use `+ geom_point()` as well. – Arun Feb 11 '13 at 22:22
  • @Arun. Many thanks. My real data is at 1o second intervals and contained in a column with just time in (called 'Time'). How do I adapt the code: idx <- c(1, diff(df$Date)) i2 <- c(1,which(idx != 1), nrow(df)+1) df1$grp <- rep(1:length(diff(i2)), diff(i2)) ... to recognise this? – KT_1 Feb 18 '13 at 15:41
  • Apologies... it's in 10 second intervals – KT_1 Feb 18 '13 at 16:22
  • @KT_1, I've provided enough ideas and clear method. Why don't you make an attempt to do it for 10hr interval and post the code with the problem you face, so that I can guide you rather than me working on the code entirely..? – Arun Feb 18 '13 at 16:25
  • I have editted my original question @Arun with my real data. I am still confused how to tell R that my data is every 10 seconds and not every hour. When I change the column name I get an error message, do I need to specify the format of this time column? – KT_1 Feb 19 '13 at 15:07
  • 1
    @KT_1, The last edit at the bottom of the post should do it. It is **really** that simple. Changing `1` to `10`. – Arun Feb 19 '13 at 15:28
  • Many thanks @Arun - that's exactly what I wanted. R studio was giving some odd values, but alas everything works when I just use R. I think there is some kind of bug in R studio regards dates. – KT_1 Feb 19 '13 at 16:48
  • `i2 <- c(1,which(idx != 1), nrow(df)+1)` isn't `cumsum()` a better tool to do this? – M-- Mar 19 '19 at 19:32
  • How could you do this with data that had two additional groups. Say I have another group of discontinuous data that I wanted to plot simultaneously? I have data sampled by month for only 6 of 12 months of the year so I'm already grouping by Year to eliminate the connecting lines, but I also have 2 zones of data I want to plot and can't figure out how to do that. – Johnny5ish Sep 15 '20 at 20:56
17

I think there is no way for R or ggplot2 to know if there is a missing data point somewhere, apart from you to specify it with an NA. This way, for example :

df1 <- rbind(df1, list(strptime("07/12/2012 09:00:00", "%d/%m/%Y %H:%M"), NA))
ggplot(df1, aes(x=Date, y=Counts)) + geom_line(aes(group = 1))

enter image description here

juba
  • 47,631
  • 14
  • 113
  • 118
  • (+1) however, in this case, its more like the OP expects two groups of plots, isn't it? I mean, isn't more appropriate to set group NOT to 1, rather a grouping variable... – Arun Feb 11 '13 at 21:33
7

Juba's answer, to include explicit NA's where you want breaks, is the best approach. Here is an alternate way to introduce those NA's in the right place (without having to figure it out manually).

every.hour <- data.frame(Date=seq(min(Date), max(Date), by="1 hour"))
df2 <- merge(df1, every.hour, all=TRUE)
g %+% df2

enter image description here

You can do something similar with your later df example, after changing the dates and times into a proper format

df$DateTime <- as.POSIXct(strptime(paste(df$Date, df$Time), 
                                   format="%m/%d/%Y %H:%M:%S"))
every.ten.seconds <- data.frame(DateTime=seq(min(df$DateTime), 
                                             max(df$DateTime), by="10 sec"))
df.10 <- merge(df, every.ten.seconds, all=TRUE)
Community
  • 1
  • 1
Brian Diggs
  • 57,757
  • 13
  • 166
  • 188
  • 3
    This is a very clean answer. Instead of merge, you can use [`complete`](https://tidyr.tidyverse.org/reference/complete.html) which will fill in NAs of every combination of variables, if you had multiple groups. – qwr Dec 30 '19 at 08:16
  • 2
    @qwr If I were writing this answer today, I probably would use something like `complete`. But `tidyr` didn't exist when I wrote this answer. Adding a new answer which shows the solution using `complete` might be useful. Feel free to do so ;) – Brian Diggs Jan 13 '20 at 16:58
  • Simple and effective answer! You could simply do `df %>% dplyr::mutate(var = if_else(condition == TRUE, NA, var)` and replace `condition == TRUE` with whatever you need. – Simon Stolz Jun 09 '20 at 15:57