0

I have three columns "Name", "success Dummy" and "Date". For each NAME I want to check the PAST SUCCESS for that NAME.

So for example if name "Peter" has occurred three times, for each time I want to count the number of "Peter" with "Success ==1" and Date happened before.

Example of the output that I need to get for "Past Success" column.

Name    Success Date   Past Success
David      1    2018    1
Peter      0    2017    3
Peter      1    2016    2
David      1    2017    0
Peter      1    2015    1
Peter      0    2010    1
Peter      1    2005    0
Peter     NA    2004    0

Is there any way to do it fast?

Also I need it to be very fast because my data is huge.

What I did is I sorted my data based on Names and Dates and check each observation against 100 observations before (because max of frequency of Names is 100).

Please advise if there is a better way to do that.

Frank
  • 66,179
  • 8
  • 96
  • 180
useR
  • 179
  • 3
  • 13
  • 1
    Hi UseR, can you provide us with a minimum reproducible example? I would like to copy and paste the code to create your data frame (or a subset of it) into my R session, and be able to reproduce what you're getting, and then recommend a solution. Thanks :) – mysteRious Apr 04 '19 at 20:57
  • can you provide sample data? – FALL Gora Apr 04 '19 at 20:59
  • Look into the `rle()` function – DanY Apr 04 '19 at 21:00
  • @FALLGora: Lines 6 and 7 of my question are actually a table of sample data and output. It is not showing correctly in columns. Is there a way to fix it? I copy pasted from an excel table – useR Apr 04 '19 at 21:01
  • 1
    This might also be helpful: https://stackoverflow.com/questions/44665944/how-to-count-rows-with-conditional-after-grouping-in-data-table – divibisan Apr 04 '19 at 21:02
  • @mysteRious: it is there now – useR Apr 04 '19 at 21:11

3 Answers3

1

Try this data table approach:

library(data.table)
data <-data.table(Name = rep(c("David","Peter","David","Peter"), c(1,2,1,4)),
                  Success = c(1,0,1,1,1,0,1,NA),
                  Date = c(2018,2017,2016,2017,2015,2010,2005,2004)
                  )
data <- data[order(Date)]
data[Success == 1,"Past Success":= cumsum(Success), by = 'Name']
FALL Gora
  • 481
  • 3
  • 8
  • It generates NAs for three columns. But thanks it is a good way to code it. – useR Apr 04 '19 at 21:27
  • NAs are values that are suppose to be zero. you can update them by doing after: ``data[ ,"Past Success":= ifelse(is.na(`Past Success`),0,`Past Success`)]`` – FALL Gora Apr 04 '19 at 21:38
1

Here are two ways to do it. One of them is almost as @FALL Gora, but the other is from base R

# these two steps are assuming you have data.table
# modify them accordingly if you have data.frame
data <- data[order(Name, Date)]
data[is.na(Success), Success := 0]

### tapply
data$past_success <- unlist(with(data, tapply(Success, Name, cumsum)))

### data.table
data[, past_success_dt := cumsum(Success), by = Name]

data

    Name Success Date past_success past_success_dt
1: David       1 2017            1               1
2: David       1 2018            2               2
3: Peter       0 2004            0               0
4: Peter       1 2005            1               1
5: Peter       0 2010            1               1
6: Peter       1 2015            2               2
7: Peter       1 2016            3               3
8: Peter       0 2017            3               3
cropgen
  • 1,920
  • 15
  • 24
0

For the record: a dplyr approach for a dataframe

library(tidyverse)
data<-data%>%
  arrange(Name, Date) %>%
  group_by(Name) %>%
  mutate(Success = replace_na(Success, 0),
         PastSuccess = cumsum(Success))
data
> data
# A tibble: 8 x 4
# Groups:   Name [2]
  Name  Success  Date PastSuccess
  <fct>   <dbl> <dbl>        <dbl>
1 David       1  2017            1
2 David       1  2018            2
3 Peter       0  2004            0
4 Peter       1  2005            1
5 Peter       0  2010            1
6 Peter       1  2015            2
7 Peter       1  2016            3
8 Peter       0  2017            3
23stacks1254
  • 369
  • 1
  • 9