0

I'm trying to write this piece of code using a for loop.

#Took Quiz X and 1
TookQuizX[1,1] <- nrow(Q1[Q1$anon_user_id %in% Q1$anon_user_id,])
TookQuizX[2,1] <- nrow(Q2[Q2$anon_user_id %in% Q1$anon_user_id,])
TookQuizX[3,1] <- nrow(Q3[Q3$anon_user_id %in% Q1$anon_user_id,])
TookQuizX[4,1] <- nrow(Q4[Q4$anon_user_id %in% Q1$anon_user_id,])
TookQuizX[5,1] <- nrow(Q5[Q5$anon_user_id %in% Q1$anon_user_id,])
TookQuizX[6,1] <- nrow(Q6[Q6$anon_user_id %in% Q1$anon_user_id,])

What I tried is the following

for(i in 1:6){
  Qx<-paste("Q",i,"[Q",i,"$anon_user_id",sep="")
  TookQuizX[i,1] <- nrow(Qx %in% Q1$anon_user_id,])
}

When I run my loop I get the following error:

Error: unexpected ']' in:
"  Qx<-paste("Q",i,"[Q",i,"$anon_user_id",sep="")
  TookQuizX[i,1] <- nrow(Qx %in% Q1$anon_user_id,]"
> }
Error: unexpected '}' in "}

What am I doing wrong?

Thanks!


This very simple example hopefully illustrates what i'm trying to do

TookQuizX <- matrix(data=NA,nrow=3,ncol=1)
Q1 <- data.frame(anon_user_id = c("A123", "A111", "A134", "A156"), other_stuf=999)
Q2 <- data.frame(anon_user_id = c("A123", "A234", "A111", "A256", "C521"), other_stuf=999)
Q3 <- data.frame(anon_user_id = c("A123", "A234", "A111", "A356", "B356"), other_stuf=999)

TookQuizX[1,1] <- nrow(Q1[Q1$anon_user_id %in% Q1$anon_user_id,])
TookQuizX[2,1] <- nrow(Q2[Q2$anon_user_id %in% Q1$anon_user_id,])
TookQuizX[3,1] <- nrow(Q3[Q3$anon_user_id %in% Q1$anon_user_id,])
Ignacio
  • 7,646
  • 16
  • 60
  • 113
  • 3
    Your main mistake is using a `for` loop. Please make your question [reproducibly](http://stackoverflow.com/a/5963610/1412059) to enable us to show you alternatives. – Roland Nov 21 '13 at 19:38
  • 1
    Of course the first comment @Roland would be condescension rather than something helpful. Ignacio, `i <- 1` and go through your loop line-by-line to see what it is actually doing. – rawr Nov 21 '13 at 19:41
  • 2
    @rawr It's not condescension, but honest advice. I might even have shown a far better alternative, if the OP provided example data. But without data I cannot test ... – Roland Nov 21 '13 at 19:44
  • 2
    Hi, this part does not make sense: `Q1$anon_user_id %in% Q1$anon_user_id` Also, you are looking for `eval(parse(text=...)))`, BUT I would advise against using that. Instead, use lists. Search through SO, as there are plenty of examples – Ricardo Saporta Nov 21 '13 at 19:55
  • @RicardoSaporta he's collecting the number of rows in each matrix that contain the anon_user_id values found in the first matrix. So the first line looks funny, but it makes sense when applied to the following matrices. Still better as a list of matrices though. – Tyler Nov 21 '13 at 20:04

3 Answers3

3

As with many operations in R, it is easier to wrap your data frames in a list.

Q_all <- list(Q1,Q2,Q3)

First, instead of using nrow, why don't you directly measure how many TRUE values there are in your %in% vector.

TookQuizX[1,1] <- length(which(Q1$anon_user_id %in% Q1$anon_user_id))

To replace your loop, here is an example of lapply:

TookQuizX[,1] <- unlist(lapply(Q_all, function(x) length(which(x$anon_user_id %in% Q_all[[1]]$anon_user_id))))

I assume that in the end, you want TookQuizX to be a matrix where entry i,j is the number of people who took Quiz i and also took Quiz j. Additionally, I assume that your user ID's are unique, and no two rows in the data frame have the same user ID. Then let's extract just the user ID's from your data frames.

anon_user_ids <- lapply(Q_all, `[[`, "anon_user_id")

One way of putting this together (and there are more efficient ways, but this is what came to mind first) would be to Map:

tmp <- Map(function(x,y) length(which(x %in% y)),
  anon_user_ids[rep(seq_along(anon_user_ids),times = length(anon_user_ids))] ,
  anon_user_ids[rep(seq_along(anon_user_ids),each = length(anon_user_ids))] )

This compares the intersection of i and j iteratively, so 1,1, 2,1, 3,1, 1,2, 2,2 and so forth. Now I can put this into a matrix. By default in matrices and arrays in R, vectors are assumed to be in column-major order (the first dimension varies quickest, and the last dimension varies slowest).

TookQuizX <- matrix(unlist(tmp), nrow = length(anon_user_ids))
     # [,1] [,2] [,3]
# [1,]    4    2    2
# [2,]    2    5    3
# [3,]    2    3    5      
Blue Magister
  • 13,044
  • 5
  • 38
  • 56
1

You need to do two things. First, you need to recreate the commands you want to run:

for(i in 1:6){
  Qx <- paste("TookQuizX[1,", i, "] <- nrow(Q", i, "[Q", i,
              "$anon_user_id %in% Q1$anon_user_id,])", sep = "")
  print(Qx)
}

This loop will produce the strings you want to evaluate as code. To do that, you need to tell R to interpret the character strings as actual code. That involves parsing the text into code, and then evaluating the code. Modifying the first loop we get:

for(i in 1:6){
  Qx <- paste("TookQuizX[1,", i, "] <- nrow(Q", i, "[Q", i,
              "$anon_user_id %in% Q1$anon_user_id,])", sep = "")
  eval(parse(text = Qx))
}
Tyler
  • 9,872
  • 2
  • 33
  • 57
  • Thanks! eval(parse(text = Qx)) is new to me, and it will be useful in the future. – Ignacio Nov 21 '13 at 21:04
  • @Ignacio No, don't use it. It's a sign of inadequate understanding of the R language and its use usually indicated badly written code. – Roland Nov 22 '13 at 07:50
0

Here's an example that solves a simplified version of what I think you're trying to accomplish.

x1 = 34
x2 = 65
x3 = 87
x4 = 298
x5 = 384
x6 = 234

var.names = sapply(1:6, function(i){
    paste0("x", i)
})

var.values = sapply(varnames, get)

 #x1  x2  x3  x4  x5  x6 
 #34  65  87 298 384 234 
kith
  • 5,486
  • 1
  • 21
  • 21