Running Count For Sequences in Groups

Question

Let's say that I want to count the number of correct or incorrect responses in a row. If you look at the column "count," I pretty much want that, but I want it to start back at one every time there is a change from correct to incorrect and vice versa. I also want it to start back at one every time the condition or module changes.

I've found what should be two solutions to this problem...however, in my case they aren't working. Here is one of them: Running Count within groups in a dataframe. I believe that it isn't working because I also need to group by condition_id, which is numeric, and is just 1,2,3,4,5...all the way up until the sequence ends.

Thanks much!

dat%>%
  group_by(pid, module, condition)%>%
  arrange(pid, module, condition, condition_id)%>%
  mutate(num.correct = ifelse(timing == "correct", 1, 0))%>%
  group_by(pid, module, condition, num.correct)%>%
  mutate(count = seq(n()))

dput:

structure(list(pid = c("ADMIN-UCSF-bo001", "ADMIN-UCSF-bo001", 
"ADMIN-UCSF-bo001", "ADMIN-UCSF-bo001", "ADMIN-UCSF-bo001", "ADMIN-UCSF-bo001", 
"ADMIN-UCSF-bo001", "ADMIN-UCSF-bo001", "ADMIN-UCSF-bo001", "ADMIN-UCSF-bo001", 
"ADMIN-UCSF-bo001"), grade = c("3", "3", "3", "3", "3", "3", 
"3", "3", "3", "3", "3"), gender = c("F", "F", "F", "F", "F", 
"F", "F", "F", "F", "F", "F"), Teacher = c("Keith, Susan", "Keith, Susan", 
"Keith, Susan", "Keith, Susan", "Keith, Susan", "Keith, Susan", 
"Keith, Susan", "Keith, Susan", "Keith, Susan", "Keith, Susan", 
"Keith, Susan"), module = c("BOXED", "BOXED", "BOXED", "BOXED", 
"BOXED", "BOXED", "BOXED", "BOXED", "BOXED", "BOXED", "BOXED"
), condition = c("Conjunction_4", "Conjunction_4", "Conjunction_4", 
"Conjunction_4", "Conjunction_4", "Conjunction_4", "Conjunction_4", 
"Conjunction_4", "Conjunction_4", "Conjunction_4", "Conjunction_4"
), trial_id = c(65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75), 
    condition_id = c(15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 
    25), correct_button = c("correct", "correct", "correct", 
    "incorrect", "incorrect", "incorrect", "incorrect", "incorrect", 
    "correct", "incorrect", "correct"), rt = c(660.721957683563, 
    728.28596830368, 509.469985961914, 744.082987308502, 843.548953533173, 
    1161.27300262451, 961.09801530838, 928.547024726868, 711.355030536652, 
    710.889995098114, 877.265989780426), rw = c(1160, 1080, 920, 
    600, 640, 680, 760, 920, 1240, 1230, 1270), last = c(1270, 
    1270, 1270, 1270, 1270, 1270, 1270, 1270, 1270, 1270, 1270
    ), time = c("2017-04-07", "2017-04-07", "2017-04-07", "2017-04-07", 
    "2017-04-07", "2017-04-07", "2017-04-07", "2017-04-07", "2017-04-07", 
    "2017-04-07", "2017-04-07"), timing = c("correct", "correct", 
    "correct", "incorrect", "incorrect", "incorrect", "incorrect", 
    "incorrect", "correct", "incorrect", "correct")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -11L))

Okay, I just changed the dput so that I don't have to include the rt part. It should work now. — James, Apr 29 '20 at 17:26

eipi10 · Accepted Answer · 2020-04-29T17:44:12.250

Let me know if this is what you were trying to do. In the code below we use cumsum to create groups, with a new group created each time correct_button changes (within a given combination of pid, module, and condition). Then we just enumerate each run:

dat %>% 
  group_by(pid, module, condition) %>% 
  mutate(change.pt = c(0, cumsum(diff(as.numeric(factor(correct_button))) != 0))) %>% 
  group_by(pid, module, condition, change.pt) %>% 
  mutate(run.count=1:n())

                pid grade gender      Teacher module     condition condition_id correct_button   rw last       time change.pt run.count
1  ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED Conjunction_4           15        correct 1160 1270 2017-04-07         0         1
2  ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED Conjunction_4           16        correct 1080 1270 2017-04-07         0         2
3  ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED Conjunction_4           17        correct  920 1270 2017-04-07         0         3
4  ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED Conjunction_4           18      incorrect  600 1270 2017-04-07         1         1
5  ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED Conjunction_4           19      incorrect  640 1270 2017-04-07         1         2
6  ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED Conjunction_4           20      incorrect  680 1270 2017-04-07         1         3
7  ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED Conjunction_4           21      incorrect  760 1270 2017-04-07         1         4
8  ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED Conjunction_4           22      incorrect  920 1270 2017-04-07         1         5
9  ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED Conjunction_4           23        correct 1240 1270 2017-04-07         2         1
10 ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED Conjunction_4           24      incorrect 1230 1270 2017-04-07         3         1
11 ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED Conjunction_4           25        correct 1270 1270 2017-04-07         4         1
12 ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED    Feature_12            1        correct 1000  740 2017-04-07         0         1
13 ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED    Feature_12            2        correct  990  740 2017-04-07         0         2
14 ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED    Feature_12            3      incorrect  980  740 2017-04-07         1         1
15 ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED    Feature_12            4        correct 1020  740 2017-04-07         2         1
16 ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED    Feature_12            5        correct 1010  740 2017-04-07         2         2
17 ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED    Feature_12            6        correct 1000  740 2017-04-07         2         3
18 ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED    Feature_12            7        correct  980  740 2017-04-07         2         4
19 ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED    Feature_12            8        correct  940  740 2017-04-07         2         5
20 ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED    Feature_12            9      incorrect  860  740 2017-04-07         3         1
21 ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED    Feature_12           10        correct  900  740 2017-04-07         4         1
22 ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED    Feature_12           11        correct  890  740 2017-04-07         4         2
23 ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED    Feature_12           12        correct  880  740 2017-04-07         4         3
24 ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED    Feature_12           13        correct  860  740 2017-04-07         4         4
25 ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED    Feature_12           14      incorrect  820  740 2017-04-07         5         1
26 ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED    Feature_12           15      incorrect  860  740 2017-04-07         5         2
27 ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED    Feature_12           16        correct  900  740 2017-04-07         6         1
28 ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED    Feature_12           17        correct  890  740 2017-04-07         6         2
29 ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED    Feature_12           18        correct  880  740 2017-04-07         6         3
30 ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED    Feature_12           19      incorrect  860  740 2017-04-07         7         1
31 ADMIN-UCSF-bo001     3      F Keith, Susan  BOXED    Feature_12           20        correct  900  740 2017-04-07         8         1

Yes. And I was working on it for four hours...I feel both angry and thankful! — James, Apr 29 '20 at 17:29
The only thing that I believe has to be changed in your code is the "run.count" variable (to any other name) in the last line, because it's the grouping variable. — James, Apr 29 '20 at 17:35
Oh, I'm using the development version of `dplyr` (soon to be released on CRAN as version 1.0.0), which removes that restriction. But, as you say, in earlier versions, you need to create a new column rather than mutate a grouping column. — eipi10, Apr 29 '20 at 17:37
Actually, the other issue I see is that it isn't restarting with a new condition. You can see this in your example when condition Conjunction_4 changes to Feature_12. — James, Apr 29 '20 at 17:41
So, this is perfect. I didn't put this in my question because then things get absurdly complicated, by there are times when rt == 0 because it's a false alarm and RW stays the same. In that case, I can just write a conditional like ```rt == 0.00000, NA, "correct")``` for correct_button, but how do I make your code skip "NA" values? When I try that in my own code...including the NAs seems to mess everything up. Can also include a reproducible example and edit my previous question, or ask a new question all together. Whatever is easiest. — James, Apr 29 '20 at 18:14
It would be helpful if you provided a sample data frame that reflects this issue. Also, if possible, exclude any columns from your sample data that aren't necessary for the problem at hand. — eipi10, Apr 29 '20 at 18:31

Ian Campbell · Answer 2 · 2020-04-29T17:45:35.503

I think this is most easily achieved with data.table::rleid.

One thing to note is that you can create a new column from within group_by.

library(dplyr)
library(data.table)
dat%>%
  group_by(pid, module, condition, rleid = rleid(correct_button)) %>%
  mutate(count = 1:n())
# A tibble: 11 x 16
# Groups:   pid, module, condition, rleid [5]
   rleid pid              grade gender Teacher      module condition     trial_id condition_id correct_button    rt    rw  last time       timing    count
   <int> <chr>            <chr> <chr>  <chr>        <chr>  <chr>            <dbl>        <dbl> <chr>          <dbl> <dbl> <dbl> <chr>      <chr>     <int>
 1     1 ADMIN-UCSF-bo001 3     F      Keith, Susan BOXED  Conjunction_4       65           15 correct         661.  1160  1270 2017-04-07 correct       1
 2     1 ADMIN-UCSF-bo001 3     F      Keith, Susan BOXED  Conjunction_4       66           16 correct         728.  1080  1270 2017-04-07 correct       2
 3     1 ADMIN-UCSF-bo001 3     F      Keith, Susan BOXED  Conjunction_4       67           17 correct         509.   920  1270 2017-04-07 correct       3
 4     2 ADMIN-UCSF-bo001 3     F      Keith, Susan BOXED  Conjunction_4       68           18 incorrect       744.   600  1270 2017-04-07 incorrect     1
 5     2 ADMIN-UCSF-bo001 3     F      Keith, Susan BOXED  Conjunction_4       69           19 incorrect       844.   640  1270 2017-04-07 incorrect     2
 6     2 ADMIN-UCSF-bo001 3     F      Keith, Susan BOXED  Conjunction_4       70           20 incorrect      1161.   680  1270 2017-04-07 incorrect     3
 7     2 ADMIN-UCSF-bo001 3     F      Keith, Susan BOXED  Conjunction_4       71           21 incorrect       961.   760  1270 2017-04-07 incorrect     4
 8     2 ADMIN-UCSF-bo001 3     F      Keith, Susan BOXED  Conjunction_4       72           22 incorrect       929.   920  1270 2017-04-07 incorrect     5
 9     3 ADMIN-UCSF-bo001 3     F      Keith, Susan BOXED  Conjunction_4       73           23 correct         711.  1240  1270 2017-04-07 correct       1
10     4 ADMIN-UCSF-bo001 3     F      Keith, Susan BOXED  Conjunction_4       74           24 incorrect       711.  1230  1270 2017-04-07 incorrect     1
11     5 ADMIN-UCSF-bo001 3     F      Keith, Susan BOXED  Conjunction_4       75           25 correct         877.  1270  1270 2017-04-07 correct       1

Yes, it works! I see you changed it so that it also counts incorrect as well. The answers are indistinguishable. Thanks. — James, Apr 29 '20 at 17:49
I also didn't know about the creating a new column within "group_by"...that's nifty! — James, Apr 29 '20 at 17:50

Running Count For Sequences in Groups

2 Answers2