Trying to calculate retention rate in R, how to divide one row by another with the same date and then apply same logic across entire data frame?

Question

I am attempting to calculate retention rate of an Instagram story (# of viewers on the last frame divided by # viewers on first frame) within the same date. I have this data within a data frame in R where each frame is listed as a a row and any frame with the same date makes up the entire story for that date. I am having a hard time figuring out how to obtain the index of the first and last frame within the same date and then dividing them and then applying this to the rest of the data frame? Any help would be greatly appreciated!

In order for people to help you effectively, please post a reproducible example and show any attempts you have made to solve the problem. A good guideline to follow is at https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — BigFinger, Jan 07 '20 at 20:59

score 0 · Accepted Answer · answered Jan 07 '20 at 21:22

Since you haven't provided your data or a reproducible example, I'll have to make some assumptions. First, I'll need to attempt to recreate your data frame from your description of it. It sounds like it looks something like this:

df
#>        dates views
#> 1  2020-01-01    32
#> 2  2020-01-01    28
#> 3  2020-01-01    28
#> 4  2020-01-01    28
#> 5  2020-01-02    28
#> 6  2020-01-02    26
#> 7  2020-01-02    26
#> 8  2020-01-02    25
#> 9  2020-01-03    25
#> 10 2020-01-03    25
#> 11 2020-01-03    25
#> 12 2020-01-03    25
#> 13 2020-01-04    23
#> 14 2020-01-04    20
#> 15 2020-01-04    20
#> 16 2020-01-04    20
#> 17 2020-01-05    18
#> 18 2020-01-05    17
#> 19 2020-01-05    17
#> 20 2020-01-05    17
#> 21 2020-01-06    15
#> 22 2020-01-06    13
#> 23 2020-01-06    12
#> 24 2020-01-06    10

So, of course, the following code will only work if you substitute df for your data frame's name, and dates and views for the appropriate column names. I will also assume the entries within each date group are ordered from earliest to latest since that is what your question implies. If that is the case, then you can do:

result <- do.call("rbind", lapply(split.data.frame(df, df$dates), function(x){
  data.frame(date = x$dates[1], retention = x$views[nrow(x)] / x$views[1])}))
rownames(result) <- 1:nrow(result)

which gives you this:

result
#>         date retention
#> 1 2020-01-01 0.8750000
#> 2 2020-01-02 0.8928571
#> 3 2020-01-03 1.0000000
#> 4 2020-01-04 0.8695652
#> 5 2020-01-05 0.9444444
#> 6 2020-01-06 0.6666667

Thank you so much for your help Allan! I apologize for the lack of data but you did exactly what I needed. Thanks! — shaun_mccracken, Jan 09 '20 at 19:44

Trying to calculate retention rate in R, how to divide one row by another with the same date and then apply same logic across entire data frame?

1 Answers1