0

I asked something very similar here but I have a better understanding of my problem now. I will try my best to ask it as clear as I can.

I have a sample dataset looks like this below:

    id <-       c(1,1,1, 2,2,2, 3,3, 4,4, 5,5,5,5, 6,6,6, 7, 8,8, 9,9, 10,10)
item.id <-  c(1,1,2, 1,1,1 ,1,1, 1,2, 1,2,2,2, 1,1,1, 1, 1,2, 1,1, 1,1)
sequence <- c(1,2,1, 1,2,3, 1,2, 1,1, 1,1,2,3, 1,2,3, 1, 1,1, 1,2, 1,2)
score <-    c(0,0,0, 0,0,1, 2,0, 1,1, 1,0,1,1, 0,0,0, 1, 0,2, 1,2, 2,1)

data <- data.frame("id"=id, "item.id"=item.id, "sequence"=sequence, "score"=score)
> data
   id item.id sequence score
1   1       1        1     0
2   1       1        2     0
3   1       2        1     0
4   2       1        1     0
5   2       1        2     0
6   2       1        3     1
7   3       1        1     2
8   3       1        2     0
9   4       1        1     1
10  4       2        1     1
11  5       1        1     1
12  5       2        1     0
13  5       2        2     1
14  5       2        3     1
15  6       1        1     0
16  6       1        2     0
17  6       1        3     0
18  7       1        1     1
19  8       1        1     0
20  8       2        1     2
21  9       1        1     1
22  9       1        2     2
23 10       1        1     2
24 10       1        2     1

id represents for each student, item.id represents the questions students take, sequence is the attempt number for each item.id, and score is the score for each attempt, taking 0,1, or 2. Students can change their answers.

For item.id within each id, I want to create a variable (status) by looking at the last two sequences (changes):

a) assign "WW" for those who changed from wrong to wrong (0 to 0),
b) assign "WR" for those who changed to increasing score (0 to 1, or 1 to 2),
c) assign "RW" for those who changed to decreasing score (2 to 1, 2 to 0, or 1 to 0 ), and
d) assign "RR" for those who changed from right to right (1 to 1, 2 to 2).

score change from 0 to 1 or 0 to 2 or 1 to 2 considered correct (right) change while, score change from 1 to 0 or 2 to 0 or 2 to 1 considered incorrect (wrong) change.

If there is only one attempt for item.id as in id=7, then the status should be "one.right". If the score was 0, then it should be "one.wrong". Meanwhile, score is considered right when it is 1 or 2, score is considered wrong when it is 0.

the desired output would be with cases:

 > desired
     id item.id    status
  1   1       1        WW
  2   1       2 one.wrong
  3   2       1        WR
  4   3       1        RW
  5   4       1 one.right
  6   4       2 one.right
  7   5       1 one.right
  8   5       2        RR
  9   6       1        WW
  10  7       1 one.right
  11  8       1 one.wrong
  12  8       2 one.right
  13  9       1        WR
  14  10      1        RW

The main difference between the previous version of the question was that I was not considering the changes

a) from 1 to 2 as WR, instead, they were coded as RR,
b) from 2 to 1 as RW, instead, they were coded as WW.

Again the logic is supposed to be if the score increases, it should be WR, if it decreases, it should be RW.

The best answer I received was this

library(dplyr)
library(purrr)
library(forcats)

data %>% 
  mutate(status = ifelse(score > 0, "R", "W")) %>% 
  group_by(id, item.id) %>% 
  filter(sequence == n() - 1 | sequence == n()) %>%  
  summarise(status = paste(status, collapse = "")) %>% 
  ungroup() %>% 
  mutate(status = fct_recode(status, "one.wrong" = "W", "one.right" = "R"))

But I need to handle decreasing/increasing score patterns.

Any opinions? Thanks!

amisos55
  • 1,913
  • 1
  • 10
  • 21
  • 2
    Just a small tip: `data.frame` will automatically pull column names from variables you give it, so you can do `data <- data.frame(id, item.id, sequence, score)`. You don't need to type out `"id" = id` unless you want to change the name. – Gregor Thomas Nov 06 '19 at 21:02

1 Answers1

1

Here's a classification of each row:

library(dplyr)
data = data %>%
  group_by(id, item.id) %>%
  mutate(diff = c(0, diff(score)),
         status = case_when(
           n() == 1 & score == 0 ~ "one.wrong",
           n() == 1 & score > 0 ~ "one.right",
           diff == 0 & score == 0 ~ "WW",
           diff == 0 & score > 0 ~ "RR",
           diff > 0 ~ "WR",
           diff < 0 ~ "RW",
           TRUE ~ "oops"
         ))
print.data.frame(data)
#    id item.id sequence score diff    status
# 1   1       1        1     0    0        WW
# 2   1       1        2     0    0        WW
# 3   1       2        1     0    0 one.wrong
# 4   2       1        1     0    0        WW
# 5   2       1        2     0    0        WW
# 6   2       1        3     1    1        WR
# 7   3       1        1     2    0        RR
# 8   3       1        2     0   -2        RW
# 9   4       1        1     1    0 one.right
# 10  4       2        1     1    0 one.right
# 11  5       1        1     1    0 one.right
# 12  5       2        1     0    0        WW
# 13  5       2        2     1    1        WR
# 14  5       2        3     1    0        RR
# 15  6       1        1     0    0        WW
# 16  6       1        2     0    0        WW
# 17  6       1        3     0    0        WW
# 18  7       1        1     1    0 one.right
# 19  8       1        1     0    0 one.wrong
# 20  8       2        1     2    0 one.right
# 21  9       1        1     1    0        RR
# 22  9       1        2     2    1        WR
# 23 10       1        1     2    0        RR
# 24 10       1        2     1   -1        RW

We can then summarize it, taking the last status value:

summarize(data, status = last(status))
# # A tibble: 14 x 3
# # Groups:   id [10]
#       id item.id status   
#    <dbl>   <dbl> <chr>    
#  1     1       1 WW       
#  2     1       2 one.wrong
#  3     2       1 WR       
#  4     3       1 RW       
#  5     4       1 one.right
#  6     4       2 one.right
#  7     5       1 one.right
#  8     5       2 RR       
#  9     6       1 WW       
# 10     7       1 one.right
# 11     8       1 one.wrong
# 12     8       2 one.right       
# 13     9       1 WR       
# 14    10       1 RW    

This appears to match your desired output.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Thanks @Gregor, all the code works but the last part gives only one object ``summarise(data, status = last(status))`` I get this ``status 1 RW`` – amisos55 Nov 07 '19 at 01:24
  • 1
    Probably you loaded `plyr` after `dplyr` and ignored the warnings, [as in this FAQ](https://stackoverflow.com/q/26106146/903061). If you explicitly use `dplyr::summarize` it should work. – Gregor Thomas Nov 07 '19 at 13:59