I have a dataset that has observations for different case files. And I would like to create a variable that indicates the number of cases that have been dealt with of that kind before a specific case is looked into.
Here is a test code and dataset to specify what I am asking.
df <- data.frame( ID= c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16),
name = c("Jon", "Jon", "Maria","Jon", "Jon", "Maria","Jon", "Jon", "Maria","Prince", "Jon", "Maria","Prince", "Jon", "Maria","Prince"),
date = c("2007-01-22", "2007-02-13", "2007-05-22", "2007-02-25", "2007-04-22", "2007-03-13", "2007-03-22", "2007-07-13", "2007-08-22",
"2007-05-10", "2007-04-18", "2007-07-09","2007-06-10", "2008-02-13","2007-09-22", "2007-05-15"))
I would like to group the observations into categories and for each observation check the date and give a count of the number of observations in that category before the stated observation.
df$date <- as.Date(df$date, '%Y-%m-%d')
df$exp = NA
for(i in 1:nrow(df)){
temp = df %>% filter(!is.na(date))
temp = temp %>% filter(name == name[i])
df$exp[i]= nrow( filter(temp,date[i]>date))
}
I tried run the code above but doesn't give the results I am looking for. It gives me the following results
ID name date exp
1 1 Jon 2007-01-22 0
2 2 Jon 2007-02-13 1
3 4 Jon 2007-02-25 5
4 7 Jon 2007-03-22 4
5 11 Jon 2007-04-18 0
6 5 Jon 2007-04-22 3
7 8 Jon 2007-07-13 7
8 14 Jon 2008-02-13 0
9 6 Maria 2007-03-13 0
10 3 Maria 2007-05-22 3
11 12 Maria 2007-07-09 0
12 9 Maria 2007-08-22 0
13 15 Maria 2007-09-22 0
14 10 Prince 2007-05-10 0
15 16 Prince 2007-05-15 0
16 13 Prince 2007-06-10 0
instead of
ID name date exp
1 1 Jon 2007-01-22 0
2 2 Jon 2007-02-13 1
3 4 Jon 2007-02-25 2
4 7 Jon 2007-03-22 3
5 11 Jon 2007-04-18 4
6 5 Jon 2007-04-22 5
7 8 Jon 2007-07-13 6
8 14 Jon 2008-02-13 7
9 6 Maria 2007-03-13 0
10 3 Maria 2007-05-22 1
11 12 Maria 2007-07-09 2
12 9 Maria 2007-08-22 3
13 15 Maria 2007-09-22 4
14 10 Prince 2007-05-10 0
15 16 Prince 2007-05-15 1
16 13 Prince 2007-06-10 2
How can I efficiently get this done?