R nested for multiple if loops to generate new vector

Question

I have 20 workers doing 100 tasks each. I have generated the true answer for each task, which is 1 out of 5 answers by

answers <- c("liver", "blood", "lung", "brain", "heart")
truth <- sample(answers, no.tasks, replace = TRUE, prob = c(0.2, 0.2, 0.2, 0.2, 0.2))

My dataSet contains the columns workerID, taskID, truth. Now I need to generate another vector where I am simulating what the worker will answer based on a certain probability. For example, if my truth for task 1, worker 1 is "liver", I want the worker 1 to answer "liver" for task 1 with a high probability. Similarly for each of the five answers for all the 2000 tasks, I want the workers answers. For that I am using the following for and if loops.

for (i in nrow(dataSet)){
if (dataSet$truth[i] == "liver")
{
df <- (rep(sample(answers, no.tasks, prob = c(0.9, 0.02, 0.02, 0.02, 0.02), no.workers)))
} else if (dataSet$truth[i] == "blood")
{ 
df <-  (rep(sample(answers, no.tasks, prob = c(0.02, 0.9, 0.02, 0.02, 0.02), no.workers)))
} else if (dataSet$truth[i] == "lung")
{
df <- (rep(sample(answers, no.tasks, prob = c(0.02, 0.02, 0.9, 0.02, 0.02), no.workers)))
} else if (dataSet$truth[i] == "brain")
{
df <- (rep(sample(answers, no.tasks, prob = c(0.02, 0.02, 0.02, 0.9, 0.02), no.workers)))
} else if (dataSet$truth[i] == "heart")
{
df <-  (rep(sample(answers, no.tasks, prob = c(0.02, 0.02, 0.02, 0.02, 0.9), no.workers)))
} else {
df <- (rep(sample(answers, no.tasks, prob = c(0.2, 0.2, 0.2, 0.2, 0.2), no.workers)))
}
}

But, since my truth for task 1 is brain, the output vector df has a lot of answers which are "brain". Can some one please hint as to what is going wrong here?

I haven't tried running your code yet, but looking at it, it doesn't look like you are actually storing your result each round, but are instead overwriting `df` everytime. Try adding a statement at the top `df <- matrix(nrow = nrow(dataSet), ncol = no.tasks)` and make your assignments `df[i, ] <- ...` — Barker, Sep 30 '16 at 00:17
Please show expected output. Only one vector? One vector per answer per task? — Parfait, Sep 30 '16 at 01:28
And what should that vector look like given example data? This helps us reproduce. — Parfait, Sep 30 '16 at 03:40
@Parfait just a vector of 2000 (20 workers*100 tasks) values with either one of the five answers > df [1] "liver" "heart" "blood" "lung" "lung" "lung" "liver" "blood" "lung" "blood" "heart" [12] "blood" "blood" "lung" "liver" "brain" "brain" "lung" "liver" "lung" "lung" "blood" [23] "liver" "lung" "heart" "heart" "blood" "liver" "lung" "brain" "brain" "blood" "blood" .... — amrapaliz, Sep 30 '16 at 04:16
ok so I changed 2 things: 1. I added df <- vector(mode="character", length=2000) and 2. for (i in 1:nrow(dataSet)), the 1: was missing. When I run the loop, I get a vector that I want but then I get this warning: In df[i] <- (rep(sample(answers, no.tasks, prob = c(0.02, ... : number of items to replace is not a multiple of replacement length But, this is ok to ignore, right? because I am replace each value in the same vector? — amrapaliz, Sep 30 '16 at 04:44

Parfait · Accepted Answer · 2016-10-01T00:35:58.333

Consider initializing with a list that carries underlying character vector of 1,000 elements.

df <- vector("list", 2000) 

for (i in 1:nrow(dataSet)){
if (dataSet$truth[i] == "liver")
{
df[[i]] <-(rep(sample(answers, no.tasks, prob = c(0.9, 0.02, 0.02, 0.02, 0.02), no.workers)))
} else if (dataSet$truth[i] == "blood")
{ 
df[[i]] <-(rep(sample(answers, no.tasks, prob = c(0.02, 0.9, 0.02, 0.02, 0.02), no.workers)))
} else if (dataSet$truth[i] == "lung")
{
df[[i]] <-(rep(sample(answers, no.tasks, prob = c(0.02, 0.02, 0.9, 0.02, 0.02), no.workers)))
} else if (dataSet$truth[i] == "brain")
{
df[[i]] <-(rep(sample(answers, no.tasks, prob = c(0.02, 0.02, 0.02, 0.9, 0.02), no.workers)))
} else if (dataSet$truth[i] == "heart")
{
df[[i]] <-(rep(sample(answers, no.tasks, prob = c(0.02, 0.02, 0.02, 0.02, 0.9), no.workers)))
} 
}

Alternatively, you can use lapply() that will output the same length list vector as the input (i.e., rows of dataSet), not requiring initialization:

df2 <- lapply(seq_len(nrow(dataSet)), function(i){
  if (dataSet$truth[i] == "liver")
  {
  temp <- (rep(sample(answers, no.tasks, prob = c(0.9, 0.02, 0.02, 0.02, 0.02), no.workers)))
  } else if (dataSet$truth[i] == "blood")
  { 
  temp <- (rep(sample(answers, no.tasks, prob = c(0.02, 0.9, 0.02, 0.02, 0.02), no.workers)))
  } else if (dataSet$truth[i] == "lung")
  {
  temp <- (rep(sample(answers, no.tasks, prob = c(0.02, 0.02, 0.9, 0.02, 0.02), no.workers)))
  } else if (dataSet$truth[i] == "brain")
  {
  temp <- (rep(sample(answers, no.tasks, prob = c(0.02, 0.02, 0.02, 0.9, 0.02), no.workers)))
  } else if (dataSet$truth[i] == "heart")
  {
  temp <- (rep(sample(answers, no.tasks, prob = c(0.02, 0.02, 0.02, 0.02, 0.9), no.workers)))
  } 
  return(temp)
})

Even better, you can trim down the nested if statements by matching the current dataSet$truth in answers vector, and then replacing the corresponding index in the probability vector with 0.9:

df3 <- lapply(seq_len(nrow(dataSet)), function(i){
  probs <- c(0.02, 0.02, 0.02, 0.02, 0.2)      
  probs[match(dataSet$truth[i], answers)] <- 0.9

  temp <- (rep(sample(answers, no.tasks, prob = probs, no.workers)))
})

Yes, thank you the lapply function is exactly what I wanted. That works well and gets rid of the loop, which is perfect because I will be working with larger data. — amrapaliz, Sep 30 '16 at 23:34
Great! Please accept if answer helped and confirms resolution. Also, `lapply()` is technically still a loop but a vectorized one and provides more clarity. See: http://stackoverflow.com/questions/2275896/is-rs-apply-family-more-than-syntactic-sugar — Parfait, Oct 01 '16 at 00:34
question: After I get the answers, I want to compare it with the answers from the dataSet to calculate the inter-rater agreement i.e. the kappa value. But, when I run this program a 100 times, I get some of the irr's to be negative. Do you have a clue as to why they would be negative? — amrapaliz, Oct 13 '16 at 20:37
That might need to be a new question as I am not aware of your *irr* process. — Parfait, Oct 14 '16 at 03:04

R nested for multiple if loops to generate new vector

1 Answers1