1

I am trying to calculate a variable that depends on the value of multiple other columns, but in other rows. Here's the sample data:

set.seed(2)
df1 <- data.frame(Participant=c(rep(1,5),rep(2,7),rep(3,10)), 

                            Action=sample(c(rep("Play",9),rep("Other",13))), 
                            time = c(sort(runif(5,1,100)),sort(runif(7,1,100)),sort(runif(10,1,100))))
df1$Action[2] ="Play" # edited to provide important test case

What I am trying to achieve is a column that tests whether the last "play" event is at most 10s ago (time column). If there is no "Play" event in the last 10s, the value of StillPlaying should be "n", regardless of current action. Here's a sample of what I would like to have:

   Part Action  time        StillPlaying
1   1   Play    15.77544    n
2   1   Play    15.89964    y
3   1   Other   35.37995    n
4   1   Play    49.38855    n
5   1   Other   83.85203    n
6   2   Other   2.031038    n
7   2   Play    14.10483    n
8   2   Other   17.29958    y
9   2   Play    36.3492     n
10  2   Play    81.20902    n
11  2   Other   87.01724    y
12  2   Other   96.30176    n
Esther
  • 441
  • 2
  • 15

1 Answers1

2

It seems like you want to group by participant and flag any row with action "Other" and where the last "Play" was within 10 seconds. You can do this using group_by in dplyr, using cummax to determine the last time a "Play" action occurred:

library(dplyr)
df1 %>%
  group_by(Participant) %>%
  mutate(StillPlaying=ifelse(time - c(-100, head(cummax(ifelse(Action == "Play", time, -100)), -1)) <= 10, "y", "n"))
#    Participant Action      time StillPlaying
#          (dbl) (fctr)     (dbl)        (chr)
# 1            1   Play 15.775439            n
# 2            1   Play 15.899643            y
# 3            1  Other 35.379953            n
# 4            1   Play 49.388550            n
# 5            1  Other 83.852029            n
# 6            2  Other  2.031038            n
# 7            2   Play 14.104828            n
# 8            2  Other 17.299582            y
# 9            2   Play 36.349196            n
# 10           2   Play 81.209022            n
# ..         ...    ...       ...          ...

If you want to keep this in base R, you could do split-apply-combine with the same basic commands using:

do.call(rbind, lapply(split(df1, df1$Participant), function(x) {
  x$StillPlaying <- ifelse(x$time - c(-100, head(cummax(ifelse(x$Action == "Play", x$time, -100)), -1)) <= 10, "y", "n")
  x
}))
#      Participant Action      time StillPlaying
# 1.1            1   Play 15.775439            n
# 1.2            1   Play 15.899643            y
# 1.3            1  Other 35.379953            n
# 1.4            1   Play 49.388550            n
# 1.5            1  Other 83.852029            n
# 2.6            2  Other  2.031038            n
# 2.7            2   Play 14.104828            n
# 2.8            2  Other 17.299582            y
# 2.9            2   Play 36.349196            n
# 2.10           2   Play 81.209022            n
# 2.11           2  Other 87.017243            y
# 2.12           2  Other 96.301761            n
# ...
josliber
  • 43,891
  • 12
  • 98
  • 133
  • sorry, I tested the solution more and realized my question wasn't quite clear - I want to know at every action, whether something is still playing (so either for play or for other actions). I have managed to modify the code to do almost what I need: _**df1 %>% group_by(Participant) %>% mutate(StillPlaying=ifelse( time - cummax(ifelse(Action == "Play", time,-100)) <= 10, "y", "n"))_** but I want the formula to say "n" on playing actions when there's nothing else playing anymore. thanks for any pointers. – Esther Feb 15 '16 at 00:25
  • @Esther I've updated the answer based on your edits to the question – josliber Feb 17 '16 at 01:30