18

I just have a data file with one column of time series:

'2012-02-01 17:42:44'
'2012-02-01 17:42:44'
'2012-02-01 17:42:44'

... I want to split the data up such that I have a count at the top of hour. Say:

'2012-02-01 17:00:00'  20   
'2012-02-01 18:00:00'  30  

The '20' and '30' represent the number of time series entries for that out period. And I want to be able to graph the time vs that 'count'. How can I do this with R?

Here is my current line graph plot.

library(ggplot2)

req <- read.table("times1.dat")
summary(req)

da <- req$V2
db <- req$V1

time <- as.POSIXct(db)

png('time_data_errs.png', width=800, height=600)
gg <- qplot(time, da) + geom_line()

print(gg)
dev.off()
zx8754
  • 52,746
  • 12
  • 114
  • 209
Berlin Brown
  • 11,504
  • 37
  • 135
  • 203

1 Answers1

32

It sounds like you want to use cut to figure out how many values occur within an hour.

It's generally helpful if you can provide some sample data. Here's some:

set.seed(1) # So you can get the same numbers as I do
MyDates <- ISOdatetime(2012, 1, 1, 0, 0, 0, tz = "GMT") + sample(1:27000, 500)
head(MyDates)
# [1] "2012-01-01 01:59:29 GMT" "2012-01-01 02:47:27 GMT" "2012-01-01 04:17:46 GMT"
# [4] "2012-01-01 06:48:39 GMT" "2012-01-01 01:30:45 GMT" "2012-01-01 06:44:13 GMT"

You can use table and cut (with the argument breaks="hour" (see ?cut.Date for more info)) to find the frequencies per hour.

MyDatesTable <- table(cut(MyDates, breaks="hour"))
MyDatesTable
# 
# 2012-01-01 00:00:00 2012-01-01 01:00:00 2012-01-01 02:00:00 2012-01-01 03:00:00 
#                  59                  73                  74                  83 
# 2012-01-01 04:00:00 2012-01-01 05:00:00 2012-01-01 06:00:00 2012-01-01 07:00:00 
#                  52                  62                  64                  33 
# Or a data.frame if you prefer
data.frame(MyDatesTable)
#                  Var1 Freq
# 1 2012-01-01 00:00:00   59
# 2 2012-01-01 01:00:00   73
# 3 2012-01-01 02:00:00   74
# 4 2012-01-01 03:00:00   83
# 5 2012-01-01 04:00:00   52
# 6 2012-01-01 05:00:00   62
# 7 2012-01-01 06:00:00   64
# 8 2012-01-01 07:00:00   33

Finally, here's a line plot of the MyDatesTable object:

plot(MyDatesTable, type="l", xlab="Time", ylab="Freq")

enter image description here


cut can handle a range of time intervals. For example, if you wanted to tabulate for every 30 minutes, you can easily adapt the breaks argument to handle that:

data.frame(table(cut(MyDates, breaks = "30 mins")))
#                   Var1 Freq
# 1  2012-01-01 00:00:00   22
# 2  2012-01-01 00:30:00   37
# 3  2012-01-01 01:00:00   38
# 4  2012-01-01 01:30:00   35
# 5  2012-01-01 02:00:00   32
# 6  2012-01-01 02:30:00   42
# 7  2012-01-01 03:00:00   39
# 8  2012-01-01 03:30:00   44
# 9  2012-01-01 04:00:00   25
# 10 2012-01-01 04:30:00   27
# 11 2012-01-01 05:00:00   33
# 12 2012-01-01 05:30:00   29
# 13 2012-01-01 06:00:00   29
# 14 2012-01-01 06:30:00   35
# 15 2012-01-01 07:00:00   33

Update

Since you were trying to plot with ggplot2, here's one approach (not sure if it is the best since I usually use base R's graphics when I need to).

Create a data.frame of the table (as demonstrated above) and add a dummy "group" variable and plot that as follows:

MyDatesDF <- data.frame(MyDatesTable, grp = 1)
ggplot(MyDatesDF, aes(Var1, Freq)) + geom_line(aes(group = grp))

enter image description here

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • @BerlinBrown, I hope it is along the lines of what you were looking for. I've updated the answer with a `ggplot` solution for plotting the aggregated data. – A5C1D2H2I1M1N2O1R2T1 Nov 30 '12 at 18:20
  • I have errors in my data, the freq has zeros and I get this error. How can I get around this. 'no applicable method for 'scale_dimension' applied to an object of class "NULL"' – Berlin Brown Nov 30 '12 at 18:27
  • @BerlinBrown, Hard to say without some sample data and the actual steps you went through. I assume you converted the data to `as.POSIXct` before you used `cut`, right? Do *all* the frequencies have zeroes? – A5C1D2H2I1M1N2O1R2T1 Nov 30 '12 at 18:34
  • OK, I just used your approach and it works. Thanks. One more question, with ggplot, the times are overlapping in the x-axis title. Do you know how to clean that up? – Berlin Brown Nov 30 '12 at 18:36
  • @BerlinBrown See [here](http://docs.ggplot2.org/current/scale_datetime.html). Basic steps: (1) make sure that your dates are `as.POSIXct` (you'll have to convert them again after you've converted your table to a `data.frame`; (2) load `library(scales)`; (3) add something like `last_plot() + scale_x_datetime(breaks = date_breaks("2 hour"))` or `last_plot() + scale_x_datetime(breaks = pretty_breaks())`. – A5C1D2H2I1M1N2O1R2T1 Nov 30 '12 at 19:09
  • @Ananda, I had the same problem as in the question specified. It worked fine with your solution but i have some values related to data&time like value1 which has numeric values how can i perform sum(value1) instead of "freq" column – Pallavi Dec 02 '15 at 06:28