R: Group by Sequence of Rows

Question

I have a data frame from IMDB that looks like this. It is unfortunately not grouped but I was hoping I can make R group it.

The data look like this:

V1                  V2              starts_with
NM: Aarons          Alex            NM
DB: 15 May 1890     Philadelphia    DB
NM: Aarons          Leroy           NM
NM: Aarons          Shawn           NM
DB: 26 March 1989   Jamaica         DB

What I would like is for a new column to be added. a sequence based on the combination of the NM & DB. Every time there is a new "NM" there will be a new number:

V1                  V2              starts_with     group
NM: Aarons          Alex            NM              1
DB: 15 May 1890     Philadelphia    DB              1
NM: Aarons          Leroy           NM              2
NM: Aarons          Shawn           NM              3
DB: 26 March 1989   Jamaica         DB              3

I of course searched SO and saw this generate sequence within group in R but that example was already grouped. My data unfortunately aren't grouped.

Given `x <- c('NM', 'DB', 'NM', 'NM', 'DB')` you would get your desired output using `cumsum(x == 'NM')` Does this help with your real data? — markus, May 21 '19 at 20:54

score 2 · Accepted Answer · answered May 21 '19 at 20:54

2

You can use a cumsum:

df$group = cumsum(df$starts_with == "NM")

answered May 21 '19 at 20:54

akash87

3,876
3
14
30

Thank you! I hadn't ever heard of this function in my searching. Thank you! – tangerine7199 May 21 '19 at 20:59

R: Group by Sequence of Rows

1 Answers1