0

In one Qualtrics survey of mine, each participant received a set of questions presented in a random order.

I now want to determine what position a question (see "Question" variable in table) had in the participant's randomized order of questions. Questions in the shortened example are numbered I1, I2, or I3.

The data are organized right now such that there are columns that correspond with order (in the shortened example below, "B1", "B2", and "B3"). That is, the question in column B1 appeared first for that participant.

Here is a file of the data (https://drive.google.com/open?id=1h18SlQ-gmRUZh93M5Y5T3TuE22yxSJbU), and here's what it looks like printed out in R:

> head(testd)
  Question B1 B2 B3
1       I1 I1 I2 I3
2       I1 I3 I2 I1
3       I2 I2 I3 I1
4       I2 I3 I1 I2
5       I3 I2 I1 I3
6       I3 I1 I3 I2

I now want to write a for loop to make a new variable "RandomizedOrder" in the dataframe testd that will tell me whether a question in the column "Question" (e.g., I1) for a participant was presented first (B1), second (B2), or third (B3). For example, in the example above, RandomizedOrder for row 1 should come out to be B1 because the value in column "Question" is I1, and the value in column "B1" is I1.

To do this, I first concatenated the values "B1", "B2", and "B3" together in "BSet".

testd <- read.csv("TestData.csv")
BSet <- c("B1", "B2", "B3")
testd[BSet]

I then wrote the following for loop. My goal: For each row i, if a certain value in one of the three BSet columns was the same as the value in the Question column, then the variable RandomizedOrder for that row should take on the column name of the value in one of the BSet columns that is the same as the value in the Question column.

For example, if testd$B1 = I1 in row 1, and testd$Question = I1 in row 1, then this for loop should make testd$RandomizedOrder equal to B1.

for (i in nrow(testd)) {
  for (j in 1:3) {
    if (testd[i,BSet][[j]] == testd$Question[i]) {
      testd$RandomizedOrder[i] <- colnames(testd[i,BSet][j])
    }
  }
}

This is what the R output looks like.

> head(testd$RandomizedOrder)
[1] NA   NA   NA   NA   NA   "B2"

I'm not sure why it produces NA values for everything except for the 6th item.

Here's what I wanted the for loop to do: Make a new variable named "RandomizedOrder" that indicated, for each row, which column contained the value found in the "Question" column.

      Question B1 B2 B3 RandomizedOrder
    1       I1 I1 I2 I3 B1
    2       I1 I3 I2 I1 B3
    3       I2 I2 I3 I1 B2
    4       I2 I3 I1 I2 B3
    5       I3 I2 I1 I3 B3
    6       I3 I1 I3 I2 B2

I looked through the code to make sure the individual parts would work out.

The code here comes out as being true (and both side of the equality signs produce the value I1):

testd[1,BSet][[1]] == testd$Question[1] [1] TRUE

I can also manually tell R to replace a value in testd$RandomizedOrder with a column name.

> testd$RandomizedOrder[1] <- colnames(testd[1,BSet][1])
> head(testd$RandomizedOrder)
[1] "B1" NA   NA   NA   NA   "B2"

Could someone please help me determine why the for loop isn't working?

Thank you in advance.

(Please note that this might seem like it could be done easily manually for this dataset with 6 observations, but this is a simplified example of my real dataset. My actual dataset has 48 questions (i.e., I1 through I48), and hundreds of observations. I've therefore indexed the number of columns represented by BSet using the letter j.)

iambwoo
  • 15
  • 4
  • 1
    This is one of those questions where we may need to see actual data or a mockup that resembles it closely. *"B1" is I1* ... is hard to understand. Also, show desired results. – Parfait Feb 21 '18 at 19:31
  • Hi @Parfait, Thanks for your response. I tried to clarify the language a bit re: "B1" is I1. By that, I mean "the value in column "B1" is I1." I also made a mock-up table of what the desired output would be (see "Here's what I wanted the for loop to do"). The actual data resemble the example data in the first block of output (see "head(testd)" in the post). I just want to get the for loop to work for the example data. – iambwoo Feb 21 '18 at 20:31
  • @Parfait I also added a link to the example data to make it easy for people to open the data in R, if they want to. – iambwoo Feb 21 '18 at 20:37

1 Answers1

0

Consider an lapply across dataframe column names for matches followed by Reduce for a coalesce method to reduce all columns into one for RandomizedOrder assignment.

txt = "Question B1 B2 B3
1       I1 I1 I2 I3
2       I1 I3 I2 I1
3       I2 I2 I3 I1
4       I2 I3 I1 I2
5       I3 I2 I1 I3
6       I3 I1 I3 I2"

testd <- read.table(text=txt, header=TRUE)

colList <-  lapply(names(testd)[-1], function(i)
  ifelse(testd$Question == testd[[i]], i, NA))

testd$RandomizedOrder <- Reduce(function(x, y) {
  x[which(is.na(x))] <- y[which(is.na(x))]
  x}, colList)

testd    
#   Question B1 B2 B3 RandomizedOrder
# 1       I1 I1 I2 I3              B1
# 2       I1 I3 I2 I1              B3
# 3       I2 I2 I3 I1              B1
# 4       I2 I3 I1 I2              B3
# 5       I3 I2 I1 I3              B3
# 6       I3 I1 I3 I2              B2
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Great, thanks so much for your suggestions with this. I'll try doing something similar with my own data. – iambwoo Feb 22 '18 at 01:12