0

I have a dataset that has observations for different case files. And I would like to create a variable that indicates the number of cases that have been dealt with of that kind before a specific case is looked into.

Here is a test code and dataset to specify what I am asking.

df <- data.frame( ID= c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16),
                name = c("Jon", "Jon", "Maria","Jon", "Jon", "Maria","Jon", "Jon", "Maria","Prince", "Jon", "Maria","Prince", "Jon", "Maria","Prince"),
                date = c("2007-01-22", "2007-02-13", "2007-05-22", "2007-02-25", "2007-04-22", "2007-03-13", "2007-03-22", "2007-07-13", "2007-08-22",
                         "2007-05-10", "2007-04-18", "2007-07-09","2007-06-10", "2008-02-13","2007-09-22", "2007-05-15"))

I would like to group the observations into categories and for each observation check the date and give a count of the number of observations in that category before the stated observation.

df$date <- as.Date(df$date, '%Y-%m-%d')    
df$exp = NA
for(i in 1:nrow(df)){
  temp = df %>% filter(!is.na(date))
  temp = temp %>% filter(name == name[i])
  df$exp[i]= nrow( filter(temp,date[i]>date))
}

I tried run the code above but doesn't give the results I am looking for. It gives me the following results

    ID   name       date exp
1   1    Jon 2007-01-22   0
2   2    Jon 2007-02-13   1
3   4    Jon 2007-02-25   5
4   7    Jon 2007-03-22   4
5  11    Jon 2007-04-18   0
6   5    Jon 2007-04-22   3
7   8    Jon 2007-07-13   7
8  14    Jon 2008-02-13   0
9   6  Maria 2007-03-13   0
10  3  Maria 2007-05-22   3
11 12  Maria 2007-07-09   0
12  9  Maria 2007-08-22   0
13 15  Maria 2007-09-22   0
14 10 Prince 2007-05-10   0
15 16 Prince 2007-05-15   0
16 13 Prince 2007-06-10   0

instead of

ID   name       date exp
1   1    Jon 2007-01-22   0
2   2    Jon 2007-02-13   1
3   4    Jon 2007-02-25   2
4   7    Jon 2007-03-22   3
5  11    Jon 2007-04-18   4
6   5    Jon 2007-04-22   5
7   8    Jon 2007-07-13   6
8  14    Jon 2008-02-13   7
9   6  Maria 2007-03-13   0
10  3  Maria 2007-05-22   1
11 12  Maria 2007-07-09   2
12  9  Maria 2007-08-22   3
13 15  Maria 2007-09-22   4
14 10 Prince 2007-05-10   0
15 16 Prince 2007-05-15   1
16 13 Prince 2007-06-10   2

How can I efficiently get this done?

Prince 1
  • 3
  • 1

1 Answers1

0

You can sort by name and date, make groups by name and use the row_number to get the result

library(tidyverse)
df %>% 
  arrange(name, as.Date(date)) %>% 
  group_by(name) %>% 
  mutate(n = row_number() - 1)

# A tibble: 16 x 4
# Groups:   name [3]
      ID name   date           n
   <dbl> <chr>  <chr>      <dbl>
 1     1 Jon    2007-01-22     0
 2     2 Jon    2007-02-13     1
 3     4 Jon    2007-02-25     2
 4     7 Jon    2007-03-22     3
 5    11 Jon    2007-04-18     4
 6     5 Jon    2007-04-22     5
 7     8 Jon    2007-07-13     6
 8    14 Jon    2008-02-13     7
 9     6 Maria  2007-03-13     0
10     3 Maria  2007-05-22     1
11    12 Maria  2007-07-09     2
12     9 Maria  2007-08-22     3
13    15 Maria  2007-09-22     4
14    10 Prince 2007-05-10     0
15    16 Prince 2007-05-15     1
16    13 Prince 2007-06-10     2
Yuriy Saraykin
  • 8,390
  • 1
  • 7
  • 14