Setting column value of a subset of rows in a dataframe in R

Question

I have a dataframe df with a column called ID. Multiple rows may have the same ID and I want to set a column value "occurrence" to indicate how many times the ID has been seen before.

for (i in unique(df$ID)) {
   rows = df[df$ID==i, ]
   for (idx in 1:nrow(rows)) {
      rows[idx,'occurrence'] = idx
   }
}

Unfortunately, this adds the occurrence column to rows, but it does not update the original data frame. How do I get the occurrence column added to df?

Update: The row_number() function pointed out by neilfws works great. Actually, I have a followup question: The dataframe also has a year column, an what I need to do is to add a new column (say Prev.Year.For.This.ID) for the year of the previous occurrence of the ID. e.g if the input is

Year = c(1991,1991,1993,1994,1995)
ID = c(1,2,1,2,1)
df <- data.frame (Year, ID)

I'd like the output to look like this:

ID Year occurrence Prev.Year.For.This.Id
1  1991     1           <NA>  
2  1992     1           <NA>
1  1993     2           1991
2  1994     2           1992
1  1995     3           1993

Please supply sample data to make this reproducible. – www Aug 31 '17 at 05:38 — www, Aug 31 '17 at 05:38

score 3 · Accepted Answer · answered Aug 31 '17 at 05:52

3

You can use dplyr to group_by ID, then row_number gives the running total of occurrences.

library(dplyr)

df1 <- data.frame(ID = c(1,2,3,1,4,5,6,2,7,8,2))
df1 %>% 
  group_by(ID) %>% 
  mutate(cnt = row_number()) %>%
  ungroup()

      ID   cnt
   <dbl> <int>
 1     1     1
 2     2     1
 3     3     1
 4     1     2
 5     4     1
 6     5     1
 7     6     1
 8     2     2
 9     7     1
10     8     1
11     2     3

answered Aug 31 '17 at 05:52

neilfws

32,751
5
50
63

Thanks. I didn't know `row_number()` was a thing. – lebelinoz Aug 31 '17 at 05:59
I am always finding new "things" in dplyr too. Took me a while to realise how grouping would influence row numbers, it's not always intuitive. – neilfws Aug 31 '17 at 06:08
Thanks so much Very elegant! – user1001630 Aug 31 '17 at 06:12

score 2 · Answer 2 · answered Aug 31 '17 at 05:48

Are you after something like the following (I made up sample data for you):

library(dplyr)
df = data.frame(ID = c(1,1,1,2,2,3))
answer = df %>% group_by(ID) %>% mutate(occurrence = cumsum(ID / ID) - 1) %>% as.data.frame

This will give something which looks like this:

ID    occurrence
1     0
1     1
1     2
2     0
2     1
3     0

The dplyr package is a great tool for grouping and summarising data. I also find the code very readable when I use the pipe %>% (though, admittedly, it does take some getting used to).

score 1 · Answer 3 · answered Aug 31 '17 at 06:02

> library(data.table)
> df = data.frame(ID = c(1,1,1,2,2,3))
> df <- data.table(df)
> df[, occurrence := sequence(.N), by = c("ID")]
> df
   ID occurrence
1:  1          1
2:  1          2
3:  1          3
4:  2          1
5:  2          2
6:  3          1

Setting column value of a subset of rows in a dataframe in R

3 Answers3