2

I have a dataset with unique Participant_IDs that are each rated by two different Rater_IDs on many different variables (Q1, Q2, and Q3 here). I am trying to find a way to compute a variable which indicates whether the two raters' ratings are within 1 point of each other. Here's a simplified version of the data I'm working with:

library(tidyverse)

Participant_ID <- rep(1:3,2)
Rater_ID <- c(rep("A",3),rep("B",3))
Q1 <- c(5, 2, 1,3, 3, 4)
Q2 <- c(4, 2, 2,3, 5, 2)
Q3 <- c(4, 3, 3,3, 4, 5)

df <- tibble(Participant_ID, Rater_ID, Q1, Q2, Q3)

I am able to do this by spelling out each iteration of the code using below:

df <- df %>% group_by(Participant_ID) %>%   
mutate(Check_Q1= ifelse((abs(Q1[1]-Q1[2]) > 1), 1, 0),          
Check_Q2= ifelse((abs(Q2[1]-Q2[2]) > 1), 1, 0),          
Check_Q3= ifelse((abs(Q3[1]-Q3[2]) > 1), 1, 0)) %>% ungroup()

Q1 is flagged (assigned a 1) for participant 1, Q2 is flagged for participant 2, and both Q1 and Q3 are flagged for participant 3, as the ratings have a difference > 1.

However, in my real data, there are not only 3 "Q" variables, there are many. Plus, I want this code to be used in a variety of situations where the number of Q variables will change. The user will specify the number_of_questions before running the code. I have been trying to figure out how to do this with a for loop but I cannot figure it out. This is as far as I've gotten:

number_of_questions <- 3
questions <- grep("Q", names(df), value=TRUE)

df <-  df %>% group_by(Participant_ID)

for(q in questions){
for(x in 1:number_of_questions){

    check_varname <- paste0("Check_Q",x)
    
    df <- df %>% 
      mutate(!!check_varname := ifelse((abs(get(q)[1]-get(q)[2]) > 1), 1, 0))   
}}

df <-  df %>% ungroup()

I don't get any errors, but the output is not correct. It is assigning a 1 to Q1, Q2, and Q3 for Participant_ID 3. Can anyone help me understand what I'm doing wrong?

Alexa
  • 23
  • 4
  • 2
    BTW, the use of `ifelse` here is not quite right, though it works. Generally the point of `ifelse` is to be vectorized, but you're always looking at a length-1 logical and returning length-1 (both yes= and no=). `ifelse` itself has baggage (e.g., is not [class-safe](https://stackoverflow.com/q/6668963/3358272)). I suggest either ditching `ifelse` completely (see my comment in jpsmith's answer), or use `if (.[1]-.[2]) 1 else 2`, or at best shift to `dplyr::if_else` (which is still overkill for this task). – r2evans Aug 08 '23 at 20:55

2 Answers2

4

You can do this using across with the .names function.

df %>%
  mutate(across(starts_with("Q"), ~ +(abs(.[1] - .[2]) > 1), # thanks @r2evans for improved code
                .names = "Check_{.col}"), .by = Participant_ID)

Output:

  Participant_ID Rater_ID    Q1    Q2    Q3 Check_Q1 Check_Q2 Check_Q3
           <int> <chr>    <dbl> <dbl> <dbl>    <dbl>    <dbl>    <dbl>
1              1 A            5     4     4        1        0        0
2              2 A            2     2     3        0        1        0
3              3 A            1     2     3        1        0        1
4              1 B            3     3     3        1        0        0
5              2 B            3     5     4        0        1        0
6              3 B            4     2     5        1        0        1

across will allow you to specify a variable numbers of questions, for instance:

# consecutive questions
number_of_questions <- 2
qcols <- paste0("Q", seq_len(number_of_questions))

df %>%
  mutate(across(qcols, ~ +(abs(.x[1] - .x[2]) > 1), 
                .names = "Check_{.col}"), .by = Participant_ID)

#  Participant_ID Rater_ID    Q1    Q2    Q3 Check_Q1 Check_Q2
#            <int> <chr>    <dbl> <dbl> <dbl>    <int>    <int>
# 1              1 A            5     4     4        1        0
# 2              2 A            2     2     3        0        1
# 3              3 A            1     2     3        1        0
# 4              1 B            3     3     3        1        0
# 5              2 B            3     5     4        0        1
# 6              3 B            4     2     5        1        0

# Alternative for non consecutive questions, 
# specify columns this way:
number_of_questions <- c(2,6,8)
qcols <- paste0("Q", number_of_questions)

(Note I assumed all questions started with "Q")

jpsmith
  • 11,023
  • 5
  • 15
  • 36
2

It may be more efficient and/or readable to reshape the data so that each question is on the rows, and the raters are on the columns:

df_long <- df %>% 
  pivot_longer(-c(Participant_ID, Rater_ID)) %>% 
  pivot_wider(names_from = Rater_ID, values_from = value)

  Participant_ID name      A     B
           <int> <chr> <dbl> <dbl>
1              1 Q1        5     3
2              1 Q2        4     3
3              1 Q3        4     3
4              2 Q1        2     3
5              2 Q2        2     5
6              2 Q3        3     4
7              3 Q1        1     4
8              3 Q2        2     2
9              3 Q3        3     5

From there, it's easy to create a check column:

df_long %>% 
  mutate(check = abs(A - B) > 1)

  Participant_ID name      A     B check
           <int> <chr> <dbl> <dbl> <lgl>
1              1 Q1        5     3 TRUE 
2              1 Q2        4     3 FALSE
3              1 Q3        4     3 FALSE
4              2 Q1        2     3 FALSE
5              2 Q2        2     5 TRUE 
6              2 Q3        3     4 FALSE
7              3 Q1        1     4 TRUE 
8              3 Q2        2     2 FALSE
9              3 Q3        3     5 TRUE 

And this could be pivoted into a wider format:

df_long %>% 
  mutate(check = abs(A - B) > 1) %>% 
  select(-c(A, B)) %>% 
  pivot_wider(names_from = name, values_from = check, names_prefix = 'check_')

  Participant_ID check_Q1 check_Q2 check_Q3
           <int> <lgl>    <lgl>    <lgl>   
1              1 TRUE     FALSE    FALSE   
2              2 FALSE    TRUE     FALSE   
3              3 TRUE     FALSE    TRUE    
jdobres
  • 11,339
  • 1
  • 17
  • 37