0

I have a data.frame with two columns (FirstName and State).

my.df = data.frame(FirstName = c('John', 'Paul', 'John', 'Sarah', 'Haley', 'Paul', 'John'),
                   State = c('VIC', 'NSW', 'VIC', 'QLD', 'TAS', 'NSW', 'VIC'))

FirstName State
   John   VIC
   Paul   NSW
   John   VIC
  Sarah   QLD
  Haley   TAS
   Paul   NSW
   John   VIC

I would like to include an additional column that lists the nth occurance for each value in the FirstName column. For example, 'John' appears in rows 1, 3 and 6 - the new column would therefore list '1' in row 1, '2' in row 3 (as this is the second time 'John' is listed) and '3' in row 6 (as this is the third time 'John' is listed).

My desired outcome would appear as follows:

FirstName State Index
   John   VIC     1
   Paul   NSW     1
   John   VIC     2
  Sarah   QLD     1
  Haley   TAS     1
   Paul   NSW     2
   John   VIC     3

Any assistance would be appreciated

Eddy_C
  • 3
  • 1
  • 2

2 Answers2

4

Or if you're feeling dplyr-ishly loopless:

new.df <- my.df %>% 
   group_by(FirstName) %>% 
   mutate(Index=1:n())

Or you can just use row_number()

Or using data.table

library(data.table)
setDT(my.df)[, Index := seq_len(.N), by = FirstName]

Or just base R

with(my.df, ave(seq(FirstName), FirstName, FUN = function(x) seq(length(x))))
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
conjugateprior
  • 398
  • 1
  • 15
0

Something quick and dirty...

my.df = data.frame(FirstName = c('John', 'Paul', 'John', 'Sarah', 'Haley', 'Paul', 'John'),
                   State = c('VIC', 'NSW', 'VIC', 'QLD', 'TAS', 'NSW', 'VIC'))

my.df$Index = 0
for(i in 1:nrow(my.df)){
  nameinds = which(my.df$FirstName==my.df$FirstName[i])
  my.df$Index[nameinds] = 1:length(nameinds)
}

print(my.df)
Digio
  • 151
  • 1
  • 4