1

I hope this finds you well. I was hoping to get some help analyzing some code where I identify a series of trials based on the start trigger (but ignoring the immediate triggers that follow). In the example below I would like to find the first 1 in a series of 1's and take the average across the next three numbers in Value_1 and Value_2. It should then find the next start period (the 8th value with the next set of 1's) and again take the average for the following 3 values, and so on. Thank you for your help and I am happy to answer any questions.

df <- data.frame(Value_1 = c(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10), Value_2 = c(10,2,3,4,5,6,7,8,10,10,1,2,3,4,5,6,7,8,9,10), Triggers = c(0,1,1,1,0,0,0,1,1,1,0,0,0,0,0,1,1,1,0,0))

In the updated_df example below I would like the code to be able to work through possible interruptions in the Trigger value (e.g., a 0 in list of 1) and find the first 1 in a group of 1's and possible zeros and take the average across the next four numbers in Value_1 and Value_2. It should then find the next start period (the 9th value with the next set of 1's and 0's) and again take the average for the following 4 values, and so on. Thank you for your help and I am happy to answer any questions.

updated_df <-df <- data.frame(
  Value_1 = c(1,2,3,3,4,5,6,7,8,9,9,10,1,2,3,4,5,6,7,8,9,9,10),
  Value_2 = c(10,2,3,3,4,5,6,7,8,10,10,10,1,2,3,4,5,6,6,7,8,9,10),
  Triggers = c(0,1,1,0,1,0,0,0,1,1,0,1,0,0,0,0,0,1,0,1,1,0,0)
)
Caroline
  • 37
  • 6
  • Triggers is 10 long, the other variables are 18; not a valid data frame. – Jon Spring Feb 14 '22 at 23:23
  • Thank you for pointing this out! My apologies for the typo. This should be fixed now! – Caroline Feb 14 '22 at 23:36
  • It might help to show some example output. By "the next 3 numbers", for example, do you mean the rows where `Value_1` = 2, 3 and 4 ? Or where `Value_1` = 1, 2 and 3 ? And what columns are we averaging? Just `Triggers` or some/all of the others? – neilfws Feb 14 '22 at 23:47
  • Where Value_1 = 2,3, and 4. For the second part of the question, we are using the Triggers variable to identify the time frame and then taking the average of the numbers in columns Value_1 and Value_2. So for Value_1 we should be averaging 2,3,4 and 8,9,10 and 6,7,8 to give us three values (3, 9, and 7). – Caroline Feb 15 '22 at 00:11
  • I've posted a general approach, but couple questions: (1) what to do with a series of Triggers like `1, 0, 1, 0, 0...`? The second `1` overlaps with the three trials being averaged by the previous start trigger. Does the first set include only two trials, or do the first and second sets overlap, each including three trials, or is one of them ignored? Or perhaps the structure of your data is such that this is guaranteed not to happen? (2) if a start trigger occurs in the last two rows, does it still count, since only 1 - 2 trials would be included? – zephryl Feb 15 '22 at 02:40
  • The triggers go in a consistent pattern throughout the trial but I want to find just the start of each of the trial groups. However, the pattern is not perfect in that it may go back to zero within a trial. I have updated the example above so now it is the first 4 rows after the first 1 (but I want it to ignore the zero interrupting the series and still put it together as a group). Does this make sense? – Caroline Feb 15 '22 at 18:52

1 Answers1

1

Here's a base R solution that handles the updated question ("interruptions in trigger values"). It includes a lag function based on this SO answer.

updated_df <- data.frame(
  Value_1 = c(1,2,3,3,4,5,6,7,8,9,9,10,1,2,3,4,5,6,7,8,9,9,10),
  Value_2 = c(10,2,3,3,4,5,6,7,8,10,10,10,1,2,3,4,5,6,6,7,8,9,10),
  Triggers = c(0,1,1,0,1,0,0,0,1,1,0,1,0,0,0,0,0,1,0,1,1,0,0)
)

# lag function, based on @Andrew's answer at 
# https://stackoverflow.com/a/13128713/17303805
lag_fx <- function(x, by = 1L, default = NA) {
  if (by < 0 || !isTRUE(all.equal(by, round(by)))) {
    stop("`by` should be a whole number >= 0")
  }
  c(rep(default, by), x)[1:length(x)]
}

# number of trials per set
set_k <- 4

### to find index of each start trigger:
# (1) make matrix to "look back" at previous k - 1 trials
lagged <- sapply(
  1:(set_k - 1), 
  \(x) lag_fx(updated_df$Triggers, by = x, default = 0)
)

# (2) then find rows where trigger == 1, but no 1s in previous k - 1 trials
starts <- which(updated_df$Triggers == 1 & rowSums(lagged) == 0)

# indices of each trigger and following k - 1 rows
sets <- lapply(starts, \(x) x + 0:(set_k - 1))

# means of each set of trials
Value_1 <- sapply(sets, \(x) mean(updated_df$Value_1[x]))
Value_2 <- sapply(sets, \(x) mean(updated_df$Value_2[x]))

# back to a data.frame
data.frame(Value_1, Value_2)

#   Value_1 Value_2
# 1     3.0    3.00
# 2     9.0    9.50
# 3     7.5    6.75
zephryl
  • 14,633
  • 3
  • 11
  • 30
  • Thank you so much for this! I think this is very close to what I am looking for. I am wondering if there is a way to adapt the code where it would ignore a possible interfering 0 value that happens during the same Trigger grouping. See the update to the original question for more details. – Caroline Feb 15 '22 at 22:30
  • @Caroline first off wanted to alert you I noticed in error in my base R code (I forgot that `stats::lag()` doesn't do the same thing as `dplyr::lag()`), which I've fixed. Re your question, that's a little bit trickier. You could use multiple lags to test for `1`s in any of the 3 preceding trials... I'll try to post an updated solution if I have a chance. – zephryl Feb 16 '22 at 23:17
  • @Caroline I've updated my answer to handle cases where `0` triggers occur during a trial block. – zephryl Feb 17 '22 at 00:04