0

I have a dataframe df with a column called ID. Multiple rows may have the same ID and I want to set a column value "occurrence" to indicate how many times the ID has been seen before.

for (i in unique(df$ID)) {
   rows = df[df$ID==i, ]
   for (idx in 1:nrow(rows)) {
      rows[idx,'occurrence'] = idx
   }
}

Unfortunately, this adds the occurrence column to rows, but it does not update the original data frame. How do I get the occurrence column added to df?

Update: The row_number() function pointed out by neilfws works great. Actually, I have a followup question: The dataframe also has a year column, an what I need to do is to add a new column (say Prev.Year.For.This.ID) for the year of the previous occurrence of the ID. e.g if the input is

Year = c(1991,1991,1993,1994,1995)
ID = c(1,2,1,2,1)
df <- data.frame (Year, ID)

I'd like the output to look like this:

ID Year occurrence Prev.Year.For.This.Id
1  1991     1           <NA>  
2  1992     1           <NA>
1  1993     2           1991
2  1994     2           1992
1  1995     3           1993
user1001630
  • 493
  • 1
  • 3
  • 16

3 Answers3

3

You can use dplyr to group_by ID, then row_number gives the running total of occurrences.

library(dplyr)

df1 <- data.frame(ID = c(1,2,3,1,4,5,6,2,7,8,2))
df1 %>% 
  group_by(ID) %>% 
  mutate(cnt = row_number()) %>%
  ungroup()

      ID   cnt
   <dbl> <int>
 1     1     1
 2     2     1
 3     3     1
 4     1     2
 5     4     1
 6     5     1
 7     6     1
 8     2     2
 9     7     1
10     8     1
11     2     3
neilfws
  • 32,751
  • 5
  • 50
  • 63
2

Are you after something like the following (I made up sample data for you):

library(dplyr)
df = data.frame(ID = c(1,1,1,2,2,3))
answer = df %>% group_by(ID) %>% mutate(occurrence = cumsum(ID / ID) - 1) %>% as.data.frame

This will give something which looks like this:

ID    occurrence
1     0
1     1
1     2
2     0
2     1
3     0

The dplyr package is a great tool for grouping and summarising data. I also find the code very readable when I use the pipe %>% (though, admittedly, it does take some getting used to).

lebelinoz
  • 4,890
  • 10
  • 33
  • 56
1
> library(data.table)
> df = data.frame(ID = c(1,1,1,2,2,3))
> df <- data.table(df)
> df[, occurrence := sequence(.N), by = c("ID")]
> df
   ID occurrence
1:  1          1
2:  1          2
3:  1          3
4:  2          1
5:  2          2
6:  3          1
Prasanna Nandakumar
  • 4,295
  • 34
  • 63