0

I have a dataset as below: the outcome have no relationship with contact_date, when a subscriber response a cold call, we mark it successful contact attempt(1) else (0). The count is how many times we called the subscriber.

 subscriber_id   outcome contact_date  queue multiple_number count
           (int)   (int)       (date) (fctr)           (int) (int)
1              1       1   2015-01-29      2               1     1
2              1       0   2015-02-21      2               1     2
3              1       0   2015-03-29      2               1     3
4              1       1   2015-04-30      2               1     4
5              2       0   2015-01-29      2               1     1
6              2       0   2015-02-21      2               1     2
7              2       0   2015-03-29      2               1     3
8              2       0   2015-04-30      2               1     4
9              2       1   2015-05-31      2               1     5
10             2       1   2015-08-25      5               1     6
11             2       0   2015-10-30      5               1     7
12             2       0   2015-12-14      5               1     8
13             3       1   2015-01-29      2               1     1

I would like to get the count number for the first outcome ==1 for each subscriber, could you please tell me how can I get it? the final data set I would like is: (Please noticed some may don't have any success call, in this case, I would like to mark the first_success as 0)

subscriber_id first_success
1                1
2                5
3                1
...
ekad
  • 14,436
  • 26
  • 44
  • 46
Jianan He
  • 31
  • 4
  • Try `aggregate(df[,2],df[1],FUN=function(x) match(1,x))`. – nicola Feb 22 '16 at 21:15
  • @nicola I think they have one of the `dplyr` classes, my guess it won't work cause you can't get a vector out- only a `data.frame`... – David Arenburg Feb 22 '16 at 21:16
  • Possible duplicate of [Extract rows for the first occurrence of a variable in a data frame](http://stackoverflow.com/questions/19944334/extract-rows-for-the-first-occurrence-of-a-variable-in-a-data-frame) – germcd Feb 22 '16 at 21:19
  • @DavidArenburg I tried and it works. Why it shouldn't? – nicola Feb 22 '16 at 21:21
  • @nicola This works for me I think! Thank you! – Jianan He Feb 22 '16 at 21:35
  • @nicola you are right, it does. I wasn't sure if `aggregate` will work on `data.frame`s instead of columns. `tapply` won't work though, compare `tapply(mtcars[, 1], mtcars[,2], sum)` with `tapply(as.tbl(mtcars)[,1],as.tbl(mtcars)[,2], sum)` for instance. I thought you weren't aware because you've used `df[,2]`, which still returns a `data.frame` rather a vector in a `dplyr` object as in `class(as.tbl(mtcars)[,1])` for instance. – David Arenburg Feb 22 '16 at 21:50
  • @DavidArenburg Tx for the comment. However, `aggregate` is supposed to work with `data.frame`s: even if the first argument is not, it gets coerced to one (unless it is a time series). So, regardless the class of `df[,2]`, `aggregate` coerces it to a `data.frame` in any case. See `?aggregate`. The only way `aggregate` couldn't possibly work was if `dplyr` had defined its own `aggregate` method with a different logic, but that's not the case. – nicola Feb 22 '16 at 22:01

1 Answers1

1
require(dplyr)    

data %>% group_by(subscriber_id) %>% filter(outcome==1) %>%
slice(which.min(contact_date)) %>% data.frame() %>% 
select(subscriber_id,count)
count
  • 1,328
  • 9
  • 16