1

I have a data frame from IMDB that looks like this. It is unfortunately not grouped but I was hoping I can make R group it.

The data look like this:

V1                  V2              starts_with
NM: Aarons          Alex            NM
DB: 15 May 1890     Philadelphia    DB
NM: Aarons          Leroy           NM
NM: Aarons          Shawn           NM
DB: 26 March 1989   Jamaica         DB

What I would like is for a new column to be added. a sequence based on the combination of the NM & DB. Every time there is a new "NM" there will be a new number:

V1                  V2              starts_with     group
NM: Aarons          Alex            NM              1
DB: 15 May 1890     Philadelphia    DB              1
NM: Aarons          Leroy           NM              2
NM: Aarons          Shawn           NM              3
DB: 26 March 1989   Jamaica         DB              3

I of course searched SO and saw this generate sequence within group in R but that example was already grouped. My data unfortunately aren't grouped.

tangerine7199
  • 443
  • 2
  • 8
  • 24
  • 1
    You can do just `cumsum(df$starts_with == "NM")`. – tmfmnk May 21 '19 at 20:54
  • Given `x <- c('NM', 'DB', 'NM', 'NM', 'DB')` you would get your desired output using `cumsum(x == 'NM')` Does this help with your real data? – markus May 21 '19 at 20:54

1 Answers1

2

You can use a cumsum:

df$group = cumsum(df$starts_with == "NM")
akash87
  • 3,876
  • 3
  • 14
  • 30