0

I have multiple dataframes that look like the first below (following on in part from this post). I summed observations in 30 minute time intervals. But my original dataset shows no rows for periods where there were no observations. So if there were no observations in a 30-minute period, there is no data. The zeros have meaning, so I would like to add them in so I can plot a full pattern over a 'night of observations'. The plot will take the form of multiple overlain lines, each representing different nights, so each table (like that below; from one night) needs the same number of observation periods, and the same start and end points.

I thought about adding a new character vector period<-c("15:30", "16:00", ..., "07:00") but the other columns would need rearranging to match up. Likewise if I broke up the vector starttime .

What I have.

        starttime       Freq
1   2013-08-21 18:00:00     27
2   2013-08-21 18:30:00     13
3   2013-08-21 19:00:00     16
4   2013-08-21 19:30:00     5
5   2013-08-21 20:00:00     8
6   2013-08-21 20:30:00     9
7   2013-08-21 21:00:00     26
8   2013-08-21 21:30:00     22
9   2013-08-21 22:00:00     61
10  2013-08-21 22:30:00     93
11  2013-08-21 23:00:00     54
12  2013-08-21 23:30:00     42
13  2013-08-22 00:00:00     11
14  2013-08-22 00:30:00     2
15  2013-08-22 01:00:00     2
16  2013-08-22 01:30:00     3
17  2013-08-22 02:00:00     2
18  2013-08-22 03:00:00     1
19  2013-08-22 04:00:00     11

str(df)
'data.frame':   19 obs. of  2 variables:
$ time2: Factor w/ 19 levels "2013-08-21 18:00:00",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Freq : int  27 13 16 5 8 9 26 22 61 93 ...

How I want it to be.

        starttime       Freq
1   2013-08-21 15:30:00     0
2   2013-08-21 18:00:00     27
3   2013-08-21 18:30:00     13
4   2013-08-21 19:00:00     16
5   2013-08-21 19:30:00     5
6   2013-08-21 20:00:00     8
7   2013-08-21 20:30:00     9
8   2013-08-21 21:00:00     26
9   2013-08-21 21:30:00     22
10  2013-08-21 22:00:00     61
11  2013-08-21 22:30:00     93
12  2013-08-21 23:00:00     54
13  2013-08-21 23:30:00     42
14  2013-08-22 00:00:00     11
15  2013-08-22 00:30:00     2
16  2013-08-22 01:00:00     2
17  2013-08-22 01:30:00     3
18  2013-08-22 02:00:00     2
19  2013-08-22 02:30:00     0
20  2013-08-22 03:00:00     1
21  2013-08-22 03:30:00     0
22  2013-08-22 04:00:00     11
23  2013-08-22 04:30:00     0
24  2013-08-22 05:00:00     0
25  2013-08-22 05:30:00     0
26  2013-08-22 06:00:00     0
27  2013-08-22 06:30:00     0

Always very grateful for advice.

Edit. Below is a dput

structure(list(time2 = structure(1:19, .Label = c("2013-08-21 18:00:00", 
"2013-08-21 18:30:00", "2013-08-21 19:00:00", "2013-08-21 19:30:00", 
"2013-08-21 20:00:00", "2013-08-21 20:30:00", "2013-08-21 21:00:00", 
"2013-08-21 21:30:00", "2013-08-21 22:00:00", "2013-08-21 22:30:00", 
"2013-08-21 23:00:00", "2013-08-21 23:30:00", "2013-08-22 00:00:00", 
"2013-08-22 00:30:00", "2013-08-22 01:00:00", "2013-08-22 01:30:00", 
"2013-08-22 02:00:00", "2013-08-22 03:00:00", "2013-08-22 04:00:00"
), class = "factor"), Freq = c(27L, 13L, 16L, 5L, 8L, 9L, 26L, 
22L, 61L, 93L, 54L, 42L, 11L, 2L, 2L, 3L, 2L, 1L, 11L)), .Names = c("time2", 
"Freq"), row.names = c(NA, -19L), class = "data.frame")
Community
  • 1
  • 1
ptenax
  • 141
  • 1
  • 14

1 Answers1

2

You can use merge:

times <- data.frame(starttime=seq(
  as.POSIXct("2013-08-21 18:00:00"), 
  as.POSIXct("2013-08-22 06:30:00"),
  by="30 min"
) )
df.fin <- merge(df, times, all.y=T)
df.fin$Freq[is.na(df.fin$Freq)] <- 0
df.fin
#              starttime Freq
# 1  2013-08-21 18:00:00   27
# 2  2013-08-21 18:30:00   13
# 3  2013-08-21 19:00:00   16
# 4  2013-08-21 19:30:00    5
# 5  2013-08-21 20:00:00    8
# ... ommitted values ...
# 20 2013-08-22 03:30:00    0
# 21 2013-08-22 04:00:00   11
# 22 2013-08-22 04:30:00    0
# 23 2013-08-22 05:00:00    0
# 24 2013-08-22 05:30:00    0
# 25 2013-08-22 06:00:00    0
# 26 2013-08-22 06:30:00    0

Also, as sven suggested, if your input data has spaces in a column, it's much easier if you dput it.

BrodieG
  • 51,669
  • 9
  • 93
  • 146
  • Thanks @BrodieG. It almost worked - the new vector `times` is perfect. But the `merge` process matches each new 30 min category in `times` against each entry in `starttime`. Would I need to specify a `by` arguement? I could not modify successfully. `dput` above. Thanks in advance! – ptenax Jan 30 '14 at 20:09
  • @ptenax, the merging column name must be the same for `merge` to work as written. Looks like the `dput` data you added does not have the same name (I used the name from the original table). You just need to use `by.x=nameofyourtimecolumn`, `by.y=starttime`. – BrodieG Jan 30 '14 at 20:12
  • Thanks again @BrodieG, my vector names were correct but I had to convert one of the time vectors from factor to POSIXct for it to work as you describe. Helped me a lot. – ptenax Feb 03 '14 at 08:01