How to group by a fixed number of rows in dplyr?

Question

I have a data frame:

set.seed(123)
x <- sample(10)
y <- x^2
my.df <- data.frame(x, y)

The result is this:

What I want is to group the rows by every n rows to compute the mean, sum, or whatever on the 5 selected rows. Something like this for n=5:

my.df %>% group_by(5) %>% summarise(sum = sum(y), mean = mean(y))

The expected output would be something like:

# A tibble: 1 x 2
     sum   mean
   <dbl>  <dbl>
1    174   34.8
2    211   42.2

Of course, the number of rows in the data frame could be 15, 20, 100, whatever. I still want to group the data every n rows.

How can I do this?

Relevant post: https://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r — zx8754, Mar 18 '19 at 16:50

score 12 · Accepted Answer · answered Mar 03 '19 at 11:54

We can use rep or gl to create the grouping variable

library(dplyr)
my.df %>% 
    group_by(grp = as.integer(gl(n(), 5, n()))) %>% 
    #or with rep
    # group_by(grp = rep(row_number(), length.out = n(), each = 5)) 
    summarise(sum = sum(y), mean = mean(y))
# A tibble: 2 x 3
#    grp   sum  mean
#  <int> <dbl> <dbl>
#1     1   174  34.8
#2     2   211  42.2

score 4 · Answer 2 · answered Dec 27 '19 at 21:03

4

Another option could be:

my.df %>%
 group_by(x = ceiling(row_number()/5)) %>%
 summarise_all(list(sum = sum, mean = mean))

      x   sum  mean
  <dbl> <dbl> <dbl>
1     1   174  34.8
2     2   211  42.2

answered Dec 27 '19 at 21:03

tmfmnk

38,881
4
47
67

How to group by a fixed number of rows in dplyr?

2 Answers2

Linked

Related