Find index of first specific result in each group from r

Question

I have a dataset as below: the outcome have no relationship with contact_date, when a subscriber response a cold call, we mark it successful contact attempt(1) else (0). The count is how many times we called the subscriber.

 subscriber_id   outcome contact_date  queue multiple_number count
           (int)   (int)       (date) (fctr)           (int) (int)
1              1       1   2015-01-29      2               1     1
2              1       0   2015-02-21      2               1     2
3              1       0   2015-03-29      2               1     3
4              1       1   2015-04-30      2               1     4
5              2       0   2015-01-29      2               1     1
6              2       0   2015-02-21      2               1     2
7              2       0   2015-03-29      2               1     3
8              2       0   2015-04-30      2               1     4
9              2       1   2015-05-31      2               1     5
10             2       1   2015-08-25      5               1     6
11             2       0   2015-10-30      5               1     7
12             2       0   2015-12-14      5               1     8
13             3       1   2015-01-29      2               1     1

I would like to get the count number for the first outcome ==1 for each subscriber, could you please tell me how can I get it? the final data set I would like is: (Please noticed some may don't have any success call, in this case, I would like to mark the first_success as 0)

subscriber_id first_success
1                1
2                5
3                1
...

@nicola I think they have one of the `dplyr` classes, my guess it won't work cause you can't get a vector out- only a `data.frame`... — David Arenburg, Feb 22 '16 at 21:16
Possible duplicate of [Extract rows for the first occurrence of a variable in a data frame](http://stackoverflow.com/questions/19944334/extract-rows-for-the-first-occurrence-of-a-variable-in-a-data-frame) — germcd, Feb 22 '16 at 21:19
@nicola you are right, it does. I wasn't sure if `aggregate` will work on `data.frame`s instead of columns. `tapply` won't work though, compare `tapply(mtcars[, 1], mtcars[,2], sum)` with `tapply(as.tbl(mtcars)[,1],as.tbl(mtcars)[,2], sum)` for instance. I thought you weren't aware because you've used `df[,2]`, which still returns a `data.frame` rather a vector in a `dplyr` object as in `class(as.tbl(mtcars)[,1])` for instance. — David Arenburg, Feb 22 '16 at 21:50
@DavidArenburg Tx for the comment. However, `aggregate` is supposed to work with `data.frame`s: even if the first argument is not, it gets coerced to one (unless it is a time series). So, regardless the class of `df[,2]`, `aggregate` coerces it to a `data.frame` in any case. See `?aggregate`. The only way `aggregate` couldn't possibly work was if `dplyr` had defined its own `aggregate` method with a different logic, but that's not the case. — nicola, Feb 22 '16 at 22:01

count · Answer 1 · 2016-02-22T21:32:32.277

1

require(dplyr)    

data %>% group_by(subscriber_id) %>% filter(outcome==1) %>%
slice(which.min(contact_date)) %>% data.frame() %>% 
select(subscriber_id,count)

edited Feb 22 '16 at 21:32

answered Feb 22 '16 at 21:16

count

1,328
9
16

Why are you `slice`ing on the `contact_date` when they ask for `outcome == 1`? – talat Feb 22 '16 at 21:28

Find index of first specific result in each group from r

1 Answers1