I am building something based upon this helpful post.
I have three related questions for the dataset df
:
machine ISOdatetime
1 M1 2013-08-21 18:16:39
2 M1 2013-08-21 18:20:44
3 M1 2013-08-21 18:21:42
4 M1 2013-08-21 18:46:09
5 M1 2013-08-21 18:46:27
6 M1 2013-08-21 19:01:13
etc
I want figure out how many values occur within half-hourly periods and put in a new dataframe, like so:
machine ISOdatetime numberobs
1 M1 2013-08-21 18:30:00 3
2 M1 2013-08-21 19:00:00 2
3 M1 2013-08-21 19:30:00 1
etc
The following code of course works nicely for neat hourly lengths:
df2 <- data.frame(table(cut(df$ISOdatetime, breaks="hour")))
The following code counts in 30 min blocks, but does not start neatly at hourly/half hourly points (it takes the starting point from the first listed time, which is 18:16:39 and designates start as 18:16:00):
df2 <-data.frame(table(cut(df$ISOdatetime, breaks = "30 mins")))
Question 1. What might be an elegant fix? Should I specify the required intervals with something like
ints <-c("18:00", "18:30", "19:00" ...)
, or is unnecessary?
Question 2. I think I will also run into trouble when I reach parts of the original dataframe df
that have values for "M2" under df$machine
because it will just count those as well. I will eventually want to plot each machine separately. Perhaps using subset
for each "machine" will be a quick way to partition the data, but then I will end up with a dataframe for each "machine". Not a problem, but is there an elegant way to build "machine" into the command above?
Question 3. In the previous post, their count was presented at the "top of the hour", which is presumably the "end time" of the hourly interval. But it was not easy to check that with the small dataset they presented. In my own data, the counts seemed to be out. With breaks=hour, what should I expect the count to be for?
Have read and tried much over many recent hours and still stuck, help very much appreciated.
#
As requested, I have added further info.
My actual data
unit nightof time date isodatetime time2
1 7849 2013-08-21 18:16:39 2013-08-21 2013-08-21 18:16:39 2013-08-22 04:00:00
2 7849 2013-08-21 18:20:44 2013-08-21 2013-08-21 18:20:44 2013-08-22 04:00:00
3 7849 2013-08-21 18:21:42 2013-08-21 2013-08-21 18:21:42 2013-08-22 04:00:00
etc
406 7849 2013-08-21 04:06:10 2013-08-22 2013-08-22 04:06:10 2013-08-22 14:00:00
407 7849 2013-08-21 04:06:12 2013-08-22 2013-08-22 04:06:12 2013-08-22 14:00:00
408 7849 2013-08-21 04:06:28 2013-08-22 2013-08-22 04:06:28 2013-08-22 14:00:00
When I str()
'data.frame': 408 obs. of 6 variables:
$ unit: int 7849 7849 7849 7849 7849 7849 7849 7849 7849 7849 ...
$ nightof: Date, format: "2013-08-21" "2013-08-21" "2013-08-21" "2013-08-21" ...
$ time: List of 408
..$ : chr "18:16:39"
..$ : chr "18:20:44"
.. [list output truncated]
$ date: Date, format: "2013-08-21" "2013-08-21" "2013-08-21" "2013-08-21" ...
$ isodatetime: POSIXlt, format: "2013-08-21 18:16:39" "2013-08-21 18:20:44" "2013-08-21 18:21:42" "2013-08-21 18:21:48" ...
$ time2: POSIXct, format: "2013-08-22 04:00:00" "2013-08-22 04:00:00" "2013-08-22 04:00:00" "2013-08-22 04:00:00" ...
The modified code I used:
`mon$time2 <- with(mon, as.POSIXct(ceiling(as.numeric(isodatetime)/(30*60)) * (30*60), origin = "1970-01-01"))
with(mon, data.frame(table(time2)))
by(mon, mon$unit, function(x){data.frame(table(x$time2))})`
The output.
mon$unit: 7849
Var1 Freq
1 2013-08-22 04:00:00 27
2 2013-08-22 04:30:00 13
3 2013-08-22 05:00:00 16
4 2013-08-22 05:30:00 5
5 2013-08-22 06:00:00 8
6 2013-08-22 06:30:00 10
7 2013-08-22 07:00:00 25
8 2013-08-22 07:30:00 22
9 2013-08-22 08:00:00 61
10 2013-08-22 08:30:00 93
11 2013-08-22 09:00:00 54
12 2013-08-22 09:30:00 42
13 2013-08-22 10:00:00 11
14 2013-08-22 10:30:00 2
15 2013-08-22 11:00:00 2
16 2013-08-22 11:30:00 3
17 2013-08-22 12:00:00 2
18 2013-08-22 13:00:00 1
19 2013-08-22 14:00:00 11