I am attempting to calculate retention rate of an Instagram story (# of viewers on the last frame divided by # viewers on first frame) within the same date. I have this data within a data frame in R where each frame is listed as a a row and any frame with the same date makes up the entire story for that date. I am having a hard time figuring out how to obtain the index of the first and last frame within the same date and then dividing them and then applying this to the rest of the data frame? Any help would be greatly appreciated!
Asked
Active
Viewed 37 times
-2
-
1In order for people to help you effectively, please post a reproducible example and show any attempts you have made to solve the problem. A good guideline to follow is at https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – BigFinger Jan 07 '20 at 20:59
1 Answers
0
Since you haven't provided your data or a reproducible example, I'll have to make some assumptions. First, I'll need to attempt to recreate your data frame from your description of it. It sounds like it looks something like this:
df
#> dates views
#> 1 2020-01-01 32
#> 2 2020-01-01 28
#> 3 2020-01-01 28
#> 4 2020-01-01 28
#> 5 2020-01-02 28
#> 6 2020-01-02 26
#> 7 2020-01-02 26
#> 8 2020-01-02 25
#> 9 2020-01-03 25
#> 10 2020-01-03 25
#> 11 2020-01-03 25
#> 12 2020-01-03 25
#> 13 2020-01-04 23
#> 14 2020-01-04 20
#> 15 2020-01-04 20
#> 16 2020-01-04 20
#> 17 2020-01-05 18
#> 18 2020-01-05 17
#> 19 2020-01-05 17
#> 20 2020-01-05 17
#> 21 2020-01-06 15
#> 22 2020-01-06 13
#> 23 2020-01-06 12
#> 24 2020-01-06 10
So, of course, the following code will only work if you substitute df
for your data frame's name, and dates
and views
for the appropriate column names. I will also assume the entries within each date group are ordered from earliest to latest since that is what your question implies. If that is the case, then you can do:
result <- do.call("rbind", lapply(split.data.frame(df, df$dates), function(x){
data.frame(date = x$dates[1], retention = x$views[nrow(x)] / x$views[1])}))
rownames(result) <- 1:nrow(result)
which gives you this:
result
#> date retention
#> 1 2020-01-01 0.8750000
#> 2 2020-01-02 0.8928571
#> 3 2020-01-03 1.0000000
#> 4 2020-01-04 0.8695652
#> 5 2020-01-05 0.9444444
#> 6 2020-01-06 0.6666667

Allan Cameron
- 147,086
- 7
- 49
- 87
-
Thank you so much for your help Allan! I apologize for the lack of data but you did exactly what I needed. Thanks! – shaun_mccracken Jan 09 '20 at 19:44