windowed time domain statistics in R: how to window one column and apply statistic methods to others

Question

I have the following data frame in R :

Time      A

1         1
2         1
3         1
4         1
5         2
6         2
7         3
8         3
9         2
10        1
11        1
12        1
13        3
14        3
15        3

Let's consider numbers in Time column are second, i need to define a window of 3 seconds, and apply two or three different methods to A column and have the results for each function in separate columns, lets consider first function is Average, second function is max like this:

Time-window        average    max
   1                  1        1
   2                  2.5      2
   3                   4       3
   4                   1       1
   4                   3       3

How can i do it in R, using any of available libraries.

What did you try so far? Did you links such as [link1](https://stackoverflow.com/questions/32078578/how-to-group-by-every-7-rows-and-aggregate-those-7-values-by-median), [link2](https://stackoverflow.com/questions/52339492/how-to-sum-every-nth-200-observation-in-a-data-frame-using-r)? — Sotos, Sep 12 '19 at 07:39

score 3 · Answer 1 · answered Sep 12 '19 at 08:09

3

A data.table solution.

library(data.table)

dat <- setDT(dat)

dat2 <- dat[, `Time-window` := rep(1:(.N/3), each = 3)][
  , .(average = mean(A), max = max(A)), by = `Time-window`
]

dat2
#    Time-window  average max
# 1:           1 1.000000   1
# 2:           2 1.666667   2
# 3:           3 2.666667   3
# 4:           4 1.000000   1
# 5:           5 3.000000   3

DATA

dat <- read.table(text = "Time      A
1         1
2         1
3         1
4         1
5         2
6         2
7         3
8         3
9         2
10        1
11        1
12        1
13        3
14        3
15        3",
                  header = TRUE, stringsAsFactors = FALSE)

answered Sep 12 '19 at 08:09

www

38,575
12
48
84

maybe `dat[,.(average = mean(A), max = max(A)), by = .("Time-window"=rep(1:(.N/3), each = 3))]`? – jangorecki Sep 12 '19 at 08:45
Thanks, what if instead of having window length as number of rows have it as a time for example every 0.2 second, i know it may not be applicable to this example – chessosapiens Sep 12 '19 at 10:59
@jangorecki Thanks for sharing. Now I see that we can create a new column in the `by` argument directly. This is cool. – www Sep 12 '19 at 13:23
1

@chessosapiens Every 0.2 second is not applicable in this example because you said your time step is second and every 0.2 second is smaller than that. However, if you can figure out a way to create the `Time-window` column, it should work. – www Sep 12 '19 at 13:24

score 1 · Answer 2 · answered Sep 12 '19 at 07:46

1

If you prefer dplyr, you can do:

df %>%
 group_by(time_window = ceiling(Time/3)) %>%
 summarise_at(2, list(mean = mean, max = max))

  time_window  mean   max
  <fct>       <dbl> <int>
1 1            1        1
2 2            1.67     2
3 3            2.67     3
4 4            1        1
5 5            3        3

Or using gl() as already posted by @Ronak Shah for a base R solution:

df %>%
 group_by(time_window = gl(n()/3, 3)) %>%
 summarise_at(2, list(mean = mean, max = max))

answered Sep 12 '19 at 07:46

tmfmnk

38,881
4
47
67

Thanks, what if instead of having window length as number of rows have it as a time for example every 0.2 second, i know it may not be applicable to this example. – chessosapiens Sep 12 '19 at 10:53
If you have rows representing 0.2 seconds, that you can apply the same principle. Otherwise, I cannot really tell how do you mean it. – tmfmnk Sep 12 '19 at 11:13
imagine each row time difference is 0.014 seconds and i want to have window length of 0.2 seconds – chessosapiens Sep 12 '19 at 11:17
You can perhaps try a cumulative sum with reset. Reseting after reach 0.2 seconds. – tmfmnk Sep 12 '19 at 11:18

score 0 · Accepted Answer · answered Sep 12 '19 at 07:43

0

Create a function which applies all the functions you need

apply_fun <- function(x) {
    c(mean = mean(x), max = max(x))
}

Create a grouping column and apply the function by group

n <- 3
df$group <- gl(nrow(df)/n, n)
aggregate(A~group, df, apply_fun)

#  group   A.mean    A.max
#1     1 1.000000 1.000000
#2     2 1.666667 2.000000
#3     3 2.666667 3.000000
#4     4 1.000000 1.000000
#5     5 3.000000 3.000000

answered Sep 12 '19 at 07:43

Ronak Shah

377,200
20
156
213

thanks, i need to add a minor thing what if i want to have the starting second for each window in group column ? for example 1 3 6 9 12 – chessosapiens Sep 12 '19 at 09:42
@chessosapiens you could add `df$Time[match(unique(df$group), df$group)]` to the final output dataframe. – Ronak Shah Sep 12 '19 at 10:04
I may need to make an edit to my original question, how can i define length of window not based on number of rows but based on a value, for example i may want to have time window of 0.2 second. – chessosapiens Sep 12 '19 at 10:52
@chessosapiens for that you might need to use `cut` or `findInterval` to divide it into based on `Time` value. Something like the following https://stackoverflow.com/questions/4126326/how-to-quickly-form-groups-quartiles-deciles-etc-by-ordering-columns-in-a – Ronak Shah Sep 12 '19 at 11:00

windowed time domain statistics in R: how to window one column and apply statistic methods to others

3 Answers3