Tracking the change in a sequence in R

Question

I asked something similar here but function gave some issue, I will try my best to ask it as clear as I can.

I have a sample dataset looks like this below:

 id <-       c(1,1,1, 2,2,2, 3,3, 4,4, 5,5,5,5, 6,6,6, 7, 8,8)
    item.id <-  c(1,1,2, 1,1,1 ,1,1, 1,2, 1,2,2,2, 1,1,1, 1, 1,2)
    sequence <- c(1,2,1, 1,2,3, 1,2, 1,1, 1,1,2,3, 1,2,3, 1, 1,1)
    score <-    c(0,0,0, 0,0,1, 2,0, 1,1, 1,0,1,1, 0,0,0, 1, 0,2)

    data <- data.frame("id"=id, "item.id"=item.id, "sequence"=sequence, "score"=score)
> data
    id item.id sequence score
1   1       1        1     0
2   1       1        2     0
3   1       2        1     0
4   2       1        1     0
5   2       1        2     0
6   2       1        3     1
7   3       1        1     2
8   3       1        2     0
9   4       1        1     1
10  4       2        1     1
11  5       1        1     1
12  5       2        1     0
13  5       2        2     1
14  5       2        3     1
15  6       1        1     0
16  6       1        2     0
17  6       1        3     0
18  7       1        1     1
19  8       1        1     0
20  8       2        1     2

id represents for each student, item.id represents the questions students take, sequence is the attempt number for each item.id, and score is the score for each attempt, taking 0,1, or 2. Students can change their answers.

For item.id within each id, I want to create a variable (status) by looking at the last two sequences (changes):

a) assign "WW" for those who changed from wrong to wrong,
b) assign "WR" for those who changed from wrong to right,
c) assign "RW" for those who changed from right to wrong, and
d) assign "RR" for those who changed from right to right.

score change from 0 to 1 or 0 to 2 considered correct (right) change while, score change from 1 to 0 or 2 to 0 considered incorrect (wrong) change.

If there is only one attempt for item.id as in id=7, then the status should be "one.right". If the score was 0, then it should be "one.wrong". Meanwhile, score is considered right when it is 1 or 2, score is considered wrong when it is 0.

the desired output would be with cases:

 > desired
  id item.id    status
  1   1       1        WW
  2   1       2 one.wrong
  3   2       1        WR
  4   3       1        RW
  5   4       1 one.right
  6   4       2 one.right
  7   5       1 one.right
  8   5       2        RR
  9   6       1        WW
  10  7       1 one.right
  11  8       1 one.wrong
  12  8       2 one.right

Any opinions? Thanks!

Iaroslav Domin · Accepted Answer · 2019-10-30T22:41:08.110

library(dplyr)
library(purrr)
library(forcats)

data %>% 
  mutate(status = ifelse(score > 0, "R", "W")) %>% 
  group_by(id, item.id) %>% 
  filter(sequence == n() - 1 | sequence == n()) %>%  
  summarise(status = paste(status, collapse = "")) %>% 
  ungroup() %>% 
  mutate(status = fct_recode(status, "one.wrong" = "W", "one.right" = "R"))

I believe it's pretty much self-describing, but I'll break it down:

1) In the first mutate we create a W/R column from score: 0 gives 'W', everything above gives 'R'.

2) Then we group the data by id, item.id and select last two rows or just keep the row if it's only one in the group (filter).

3) After that we squeeze this status column into one string in each group (summarize). So the possible values are: 'W', 'R', 'WW', 'WR', 'RW', 'RR'.

4) The last thing that is left to do is to recode 'W' to 'one.wrong' and 'R' to 'one.right', using forcats::fct_recode.

score 1 · Answer 2 · answered Oct 30 '19 at 22:19

1

Similar but not as elegant as @laroslav Domin's answer:

library(tidyverse) 
data %>%
  group_by(id, item.id) %>%
  top_n(2, sequence) %>%
  mutate(sequence = row_number()) %>%
  pivot_wider(names_from = sequence, 
              names_prefix = "c", 
              values_from = score) %>%
  mutate(result = case_when(
    c1 == 0 & c2 == 0 ~ "WW",
    c1 == 0 & c2 >  0 ~ "WR",
    c1 >  1 & c2 == 0 ~ "RW",
    c1 >  1 & c2 >  0 ~ "RR",
    c1 == 0 ~ "one.wrong",
    c1 >  0 ~ "one.right",
    TRUE ~ "OTHER")
  )

answered Oct 30 '19 at 22:19

Jon Spring

55,165
4
35
53

@Spring, thank for your reply. After I ran your code, I received this error. ``Error in rank(x, ties.method = "first", na.last = "keep") : argument "x" is missing, with no default``. I have bot `plyr` and `dplyr` libraries loaded. Would that cause any issue? – amisos55 Oct 31 '19 at 01:32

score 1 · Answer 3 · answered Oct 31 '19 at 11:09

This is a data.table solution that has some inspiration from @laroslavDomin:

library(data.table)
setDT(data)

data[, {
  if (.N == 1) {
    if (score == 0) {
      'one.wrong'
    } else {
      'one.right'
    }
  } else {
    paste0(ifelse(score > 0, 'R', 'W')[c(1, .N)], collapse = '')
  }
},
by = .(id, item.id)]

Tracking the change in a sequence in R

3 Answers3

Linked