grouping by number sequences in r

Question

say I have the iris dataset.

data(iris)

There are 150 rows in the dataset.

How can I group_by the first 50 rows and then group_by rows 51:100, and then finally group_by 101:150?

Some python code that I am following is using (nothing to do with the iris data):

data.reset_index().assign(groupId = lambda row: row.index // 1000)

Wimpel · Answer 1 · 2019-09-27T06:47:58.617

2

A data.table approach:

library(data.table)
setDT(iris)[, .( mean( Sepal.Length ) ), by = .( rleid( 0:(nrow( iris ) - 1) %/% 50) )][]

rleid() is used to create groups to summarise by (in this case, the mean of Sepal.Length (by group of 50 rows) is calculated into column V1.

   rleid    V1
1:     1 5.006
2:     2 5.936
3:     3 6.588

edited Sep 27 '19 at 06:47

answered Sep 26 '19 at 17:39

Wimpel

26,031
1
20
37

akrun · Accepted Answer · 2019-09-26T18:29:17.183

On option would be gl (or another is rep)

library(dplyr)
iris %>%
   group_by(grp = as.integer(gl(n(), 50, n()))) %>%
   summarise_if(is.numeric, mean)
# A tibble: 3 x 5
#    grp Sepal.Length Sepal.Width Petal.Length Petal.Width
#  <int>        <dbl>       <dbl>        <dbl>       <dbl>
#1     1         5.01        3.43         1.46       0.246
#2     2         5.94        2.77         4.26       1.33 
#3     3         6.59        2.97         5.55       2.03

Or another option is %/%

(seq_len(nrow(iris))-1) %/% 50 + 1

grouping by number sequences in r

2 Answers2