3

I have data frame with 12511 rows and 16 columns obtained from animal running experiment. Each row representing running counts every minute for each animal. I would like to sum the columns on every 60th row (that is counts per hour).

I tried to use apply function for summing 60 rows:

apply(rw[1:60,],2,sum) 
apply(rw[61:120,],2,sum)
apply(rw[121:180,],2,sum)

... keeping to do this until 12511 is unthinkable and time consuming.

I am sure there is a smart way to condense my data to 208 rows. Please help!!

Thank you.

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
henok
  • 33
  • 1
  • 4

2 Answers2

6

Here's an approach using data.table package and the vectorized colSumsfunction

Some data first:

set.seed(123)
rw <- data.frame(a = sample(12511), b = sample(12511), c = sample(12511))

Then, we will create and index using gl and run colSums per group

library(data.table)
setDT(rw)[, as.list(colSums(.SD)), by = gl(ceiling(12511/60), 60, 12511)]
#       gl      a      b      c
#   1:   1 378678 387703 388143
#   2:   2 384532 331275 341092
#   3:   3 355397 367039 369012
#   4:   4 378483 355384 367988
#   5:   5 365193 372779 388020
# ---                         
# 205: 205 385361 409004 389946
# 206: 206 407232 406940 345496
# 207: 207 363253 357317 356878
# 208: 208 387336 383786 348978
# 209: 209 186874 188616 183500

Another similar approach would be

setDT(rw)[, lapply(.SD, sum), by = gl(ceiling(12511/60), 60, 12511)]

Or using dplyrs summarise_each function, could similarly do

library(dplyr)
rw %>%
  group_by(indx = gl(ceiling(12511/60), 60, 12511)) %>%
  summarise_each(funs(sum))
# Source: local data table [209 x 4]
# 
#    indx      a      b      c
# 1     1 378678 387703 388143
# 2     2 384532 331275 341092
# 3     3 355397 367039 369012
# 4     4 378483 355384 367988
# 5     5 365193 372779 388020
# 6     6 387260 386737 347777
# 7     7 343980 412633 383429
# 8     8 355059 352393 336798
# 9     9 372722 386863 425622
# 10   10 406628 370606 362041
# ..  ...    ...    ...    ...
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • I like the index generation using `gl()` and `ceiling()`. Solution deals with unequal group size. +1 ! For the `dplyr` approach, I came up with `group_by(id = gl(ceiling(nrow(.)/60), 60, nrow(.)))` in case his dataset input isn't *always* 12511 – Steven Beaupré Feb 01 '15 at 21:03
  • Thanks a lot, i did not noticed that the first code by @RStudent ignores some of my data. But I just tried to use your code to compute the complete data but did not figured out what are the (---) in the out put? How can I make the data visible :) – henok Feb 01 '15 at 21:27
  • Save the result somewhere and then use `View` on it. – David Arenburg Feb 01 '15 at 21:28
  • I just started using R or programming in general :) – henok Feb 01 '15 at 21:39
  • `Res <- setDT(rw)[, as.list(colSums(.SD)) , gl(ceiling(12511/60), 60, 12511)] ; View(Res)` – David Arenburg Feb 01 '15 at 21:40
  • Many thanks for your assistance! Your code lines are fancy, It will take me some time to understand what you exactly did. Appreciated! – henok Feb 01 '15 at 21:58
0

Here is my version using David Arenburg's data:

set.seed(123)
rw <- data.frame(a = sample(12511), b = sample(12511), c = sample(12511))

ind <- c(rep(60, floor(dim(rw)[1]/60)), floor(dim(rw)[1]%%60))
ind <- rep(1:length(ind), times = ind)
head(apply(rw, 2, function(x) tapply(x, ind, sum)))
       a      b      c
1 378678 387703 388143
2 384532 331275 341092
3 355397 367039 369012
4 378483 355384 367988
5 365193 372779 388020
6 387260 386737 347777
DatamineR
  • 10,428
  • 3
  • 25
  • 45