1

say I have the iris dataset.

data(iris)

There are 150 rows in the dataset.

How can I group_by the first 50 rows and then group_by rows 51:100, and then finally group_by 101:150?

Some python code that I am following is using (nothing to do with the iris data):

data.reset_index().assign(groupId = lambda row: row.index // 1000)
user8959427
  • 2,027
  • 9
  • 20

2 Answers2

2

A data.table approach:

library(data.table)
setDT(iris)[, .( mean( Sepal.Length ) ), by = .( rleid( 0:(nrow( iris ) - 1) %/% 50) )][]

rleid() is used to create groups to summarise by (in this case, the mean of Sepal.Length (by group of 50 rows) is calculated into column V1.

   rleid    V1
1:     1 5.006
2:     2 5.936
3:     3 6.588
Wimpel
  • 26,031
  • 1
  • 20
  • 37
1

On option would be gl (or another is rep)

library(dplyr)
iris %>%
   group_by(grp = as.integer(gl(n(), 50, n()))) %>%
   summarise_if(is.numeric, mean)
# A tibble: 3 x 5
#    grp Sepal.Length Sepal.Width Petal.Length Petal.Width
#  <int>        <dbl>       <dbl>        <dbl>       <dbl>
#1     1         5.01        3.43         1.46       0.246
#2     2         5.94        2.77         4.26       1.33 
#3     3         6.59        2.97         5.55       2.03 

Or another option is %/%

(seq_len(nrow(iris))-1) %/% 50 + 1
akrun
  • 874,273
  • 37
  • 540
  • 662