1

I have the following data frame in R :

Time      A

1         1
2         1
3         1
4         1
5         2
6         2
7         3
8         3
9         2
10        1
11        1
12        1
13        3
14        3
15        3

Let's consider numbers in Time column are second, i need to define a window of 3 seconds, and apply two or three different methods to A column and have the results for each function in separate columns, lets consider first function is Average, second function is max like this:

Time-window        average    max
   1                  1        1
   2                  2.5      2
   3                   4       3
   4                   1       1
   4                   3       3

How can i do it in R, using any of available libraries.

chessosapiens
  • 3,159
  • 10
  • 36
  • 58
  • What did you try so far? Did you links such as [link1](https://stackoverflow.com/questions/32078578/how-to-group-by-every-7-rows-and-aggregate-those-7-values-by-median), [link2](https://stackoverflow.com/questions/52339492/how-to-sum-every-nth-200-observation-in-a-data-frame-using-r)? – Sotos Sep 12 '19 at 07:39

3 Answers3

3

A data.table solution.

library(data.table)

dat <- setDT(dat)

dat2 <- dat[, `Time-window` := rep(1:(.N/3), each = 3)][
  , .(average = mean(A), max = max(A)), by = `Time-window`
]

dat2
#    Time-window  average max
# 1:           1 1.000000   1
# 2:           2 1.666667   2
# 3:           3 2.666667   3
# 4:           4 1.000000   1
# 5:           5 3.000000   3

DATA

dat <- read.table(text = "Time      A
1         1
2         1
3         1
4         1
5         2
6         2
7         3
8         3
9         2
10        1
11        1
12        1
13        3
14        3
15        3",
                  header = TRUE, stringsAsFactors = FALSE)
www
  • 38,575
  • 12
  • 48
  • 84
  • maybe `dat[,.(average = mean(A), max = max(A)), by = .("Time-window"=rep(1:(.N/3), each = 3))]`? – jangorecki Sep 12 '19 at 08:45
  • Thanks, what if instead of having window length as number of rows have it as a time for example every 0.2 second, i know it may not be applicable to this example – chessosapiens Sep 12 '19 at 10:59
  • @jangorecki Thanks for sharing. Now I see that we can create a new column in the `by` argument directly. This is cool. – www Sep 12 '19 at 13:23
  • 1
    @chessosapiens Every 0.2 second is not applicable in this example because you said your time step is second and every 0.2 second is smaller than that. However, if you can figure out a way to create the `Time-window` column, it should work. – www Sep 12 '19 at 13:24
1

If you prefer dplyr, you can do:

df %>%
 group_by(time_window = ceiling(Time/3)) %>%
 summarise_at(2, list(mean = mean, max = max))

  time_window  mean   max
  <fct>       <dbl> <int>
1 1            1        1
2 2            1.67     2
3 3            2.67     3
4 4            1        1
5 5            3        3

Or using gl() as already posted by @Ronak Shah for a base R solution:

df %>%
 group_by(time_window = gl(n()/3, 3)) %>%
 summarise_at(2, list(mean = mean, max = max))
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
  • Thanks, what if instead of having window length as number of rows have it as a time for example every 0.2 second, i know it may not be applicable to this example. – chessosapiens Sep 12 '19 at 10:53
  • If you have rows representing 0.2 seconds, that you can apply the same principle. Otherwise, I cannot really tell how do you mean it. – tmfmnk Sep 12 '19 at 11:13
  • imagine each row time difference is 0.014 seconds and i want to have window length of 0.2 seconds – chessosapiens Sep 12 '19 at 11:17
  • You can perhaps try a cumulative sum with reset. Reseting after reach 0.2 seconds. – tmfmnk Sep 12 '19 at 11:18
0

Create a function which applies all the functions you need

apply_fun <- function(x) {
    c(mean = mean(x), max = max(x))
}

Create a grouping column and apply the function by group

n <- 3
df$group <- gl(nrow(df)/n, n)
aggregate(A~group, df, apply_fun)

#  group   A.mean    A.max
#1     1 1.000000 1.000000
#2     2 1.666667 2.000000
#3     3 2.666667 3.000000
#4     4 1.000000 1.000000
#5     5 3.000000 3.000000
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • thanks, i need to add a minor thing what if i want to have the starting second for each window in group column ? for example 1 3 6 9 12 – chessosapiens Sep 12 '19 at 09:42
  • @chessosapiens you could add `df$Time[match(unique(df$group), df$group)]` to the final output dataframe. – Ronak Shah Sep 12 '19 at 10:04
  • I may need to make an edit to my original question, how can i define length of window not based on number of rows but based on a value, for example i may want to have time window of 0.2 second. – chessosapiens Sep 12 '19 at 10:52
  • @chessosapiens for that you might need to use `cut` or `findInterval` to divide it into based on `Time` value. Something like the following https://stackoverflow.com/questions/4126326/how-to-quickly-form-groups-quartiles-deciles-etc-by-ordering-columns-in-a – Ronak Shah Sep 12 '19 at 11:00