r - nth occurrence in a dataframe

Question

I have a data.frame with two columns (FirstName and State).

my.df = data.frame(FirstName = c('John', 'Paul', 'John', 'Sarah', 'Haley', 'Paul', 'John'),
                   State = c('VIC', 'NSW', 'VIC', 'QLD', 'TAS', 'NSW', 'VIC'))

FirstName State
   John   VIC
   Paul   NSW
   John   VIC
  Sarah   QLD
  Haley   TAS
   Paul   NSW
   John   VIC

I would like to include an additional column that lists the nth occurance for each value in the FirstName column. For example, 'John' appears in rows 1, 3 and 6 - the new column would therefore list '1' in row 1, '2' in row 3 (as this is the second time 'John' is listed) and '3' in row 6 (as this is the third time 'John' is listed).

My desired outcome would appear as follows:

FirstName State Index
   John   VIC     1
   Paul   NSW     1
   John   VIC     2
  Sarah   QLD     1
  Haley   TAS     1
   Paul   NSW     2
   John   VIC     3

Any assistance would be appreciated

score 4 · Accepted Answer · edited Jul 23 '15 at 13:39

4

Or if you're feeling dplyr-ishly loopless:

new.df <- my.df %>% 
   group_by(FirstName) %>% 
   mutate(Index=1:n())

Or you can just use row_number()

Or using data.table

library(data.table)
setDT(my.df)[, Index := seq_len(.N), by = FirstName]

Or just base R

with(my.df, ave(seq(FirstName), FirstName, FUN = function(x) seq(length(x))))

edited Jul 23 '15 at 13:39

David Arenburg

91,361
17
137
196

answered Jul 23 '15 at 13:10

conjugateprior

398
1
15

No I'm not 'kidding you'. I just didn't know about `row_number`. Maybe feign a little friendliness, unless that's what passes for friendly over here. – conjugateprior Jul 23 '15 at 13:26
And I was just doing the same myself... – conjugateprior Jul 23 '15 at 13:36

Digio · Answer 2 · 2015-07-24T11:15:17.353

0

Something quick and dirty...

my.df = data.frame(FirstName = c('John', 'Paul', 'John', 'Sarah', 'Haley', 'Paul', 'John'),
                   State = c('VIC', 'NSW', 'VIC', 'QLD', 'TAS', 'NSW', 'VIC'))

my.df$Index = 0
for(i in 1:nrow(my.df)){
  nameinds = which(my.df$FirstName==my.df$FirstName[i])
  my.df$Index[nameinds] = 1:length(nameinds)
}

print(my.df)

edited Jul 24 '15 at 11:15

answered Jul 23 '15 at 11:03

Digio

151
1
4

r - nth occurrence in a dataframe

2 Answers2