6

I'm having a looping issue. It should be simple to solve, but "R for Stata Users" (I've coded in Stata for a couple of years), Roger Peng's videos, and Google don't seem to be helping me. Can one of you please explain to me what I'm doing wrong?

I'm trying to write a loop that run through the 'thresholds' dataframe to pull out information from three sets of columns. I can do what I want to do by writing the same segment of code three times, but as the code gets more complicated, this will become quite cumbersome.

Here is a sample of 'thresholds' (see dput output below, added by a friendly reader):

    threshold_1_name      threshold_1_dir threshold_1_value
1   overweight            >                25
2   possible malnutrition <                31
3   Q1                    >                998
4   Q1                    >                998
5   Q1                    >                998
6   Q1                    >                998
    threshold_1_units threshold_2_name threshold_2_dir threshold_2_value threshold_2_units
1   kg/m^2            obese               >             30                kg/m^2
2   cm                <NA>                >             NA                   
3   <NA>              Q3                  >             998                  
4                     Q3                  >             998                  
5                     Q3                  >             998                  
6                     Q3                  >             998  

This code does what I want to do:

newvars1 <- paste(thresholds$varname, thresholds$threshold_1_name, sep = "_")
noval <- is.na(thresholds$threshold_1_value)
newvars1 <- newvars1[!noval]

newvars2 <- paste(thresholds$varname, thresholds$threshold_2_name, sep = "_")
noval <- is.na(thresholds$threshold_2_value)
newvars2 <- newvars2[!noval]

newvars3 <- paste(thresholds$varname, thresholds$threshold_3_name, sep = "_")
noval <- is.na(thresholds$threshold_3_value)
newvars3 <- newvars3[!noval]

And here is how I am trying to loop:

variables <- NULL
for (i in 1:3) {
  valuevar <- paste("threshold", i, "value", sep = "_")
  namevar <- paste("threshold", i, "name", sep = "_")
  newvar <- paste("varnames", i, sep = "")
  for (j in 1:length(thresholds$varname)) { 
    check <- is.na(thresholds[valuevar[j]])
    if (check == FALSE) {
      newvars <- paste(thresholds$varname, thresholds[namevar], sep = "_")
    }
  }
  variables <- c(variables, newvars)
}

And here is the error I am receiving:

Error: unexpected '}' in "}"

I think something about the way I am calling the 'i' is messing things up, but I'm not sure how to do it correctly. My Stata habits using locals are really biting me in the butt as I switch to R.

EDIT to add dput output, by a friendly reader:

thresholds <- structure(list(varname = structure(1:6, .Label = c("varA", "varB", 
"varC", "varD", "varE", "varF"), class = "factor"), threshold_1_name = c("overweight", 
"possible malnutrition", "Q1", "Q1", "Q1", "Q1"), threshold_1_dir = c(">", 
"<", ">", ">", ">", ">"), threshold_1_value = c(25L, 31L, 998L, 
998L, 998L, 998L), threshold_1_units = c("kg/m^2", "cm", NA, 
NA, NA, NA), threshold_2_name = c("obese", "<NA>", "Q3", "Q3", 
"Q3", "Q3"), threshold_2_dir = c(">", ">", ">", ">", ">", ">"
), threshold_2_value = c(30L, NA, 998L, 998L, 998L, 998L), threshold_2_units = c("kg/m^2", 
"cm", NA, NA, NA, NA)), .Names = c("varname", "threshold_1_name", 
"threshold_1_dir", "threshold_1_value", "threshold_1_units", 
"threshold_2_name", "threshold_2_dir", "threshold_2_value", "threshold_2_units"
), row.names = c(NA, -6L), class = "data.frame")
Metrics
  • 15,172
  • 7
  • 54
  • 83
  • 1
    The immediate error is that you are missing an end-paren on the line `for (j in 1:length(thresholds$varname) {`. – Blue Magister Dec 21 '12 at 21:33
  • @BlueMagister I don't see that. Line 11 of his code contains the closer for that. – Brandon Bertelsen Dec 21 '12 at 21:34
  • @BrandonBertelsen Line 11 closes the curly brace, but there is no closing parenthesis for the `for` statement. – Blue Magister Dec 21 '12 at 21:36
  • Can you provide a sample of the data frame you are using? Something like copy-pasting `dput(head(thresholds))`? See [here](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for making a good reproducible example. – Blue Magister Dec 21 '12 at 21:38

3 Answers3

6

The first problem I see is in if(check = "FALSE") that's an assignment = if you're testing a condition it needs to be ==. Also, quoting the word "FALSE" means you're testing a variable for the string value (literally the word FALSE), not the logical value, which is FALSE without the quotations.

The second problem has been rightly pointed out by @BlueMagister, you're missing ) at the end of for (j in 1:length(...)) {

See # bad!

  for (j in 1:length(thresholds$varname)) { 
    check <- is.na(thresholds[valuevar[j]])
    if (check = "FALSE") { # bad!
      newvars <- paste(thresholds$varname, thresholds[namevar], sep = "_")
    }
  }

See # good!

  for (j in 1:length(thresholds$varname)) { 
    check <- is.na(thresholds[valuevar[j]])
    if (check == FALSE) { # good!
      newvars <- paste(thresholds$varname, thresholds[namevar], sep = "_")
    }
  }

But because it's an if statement you can use really simple logic, especially on logicals (TRUE / FALSE values).

See # better!

  for (j in 1:length(thresholds$varname)) { 
    check <- is.na(thresholds[valuevar[j]])
    if (!check) { # better!
      newvars <- paste(thresholds$varname, thresholds[namevar], sep = "_")
    }
  }
Brandon Bertelsen
  • 43,807
  • 34
  • 160
  • 255
1

There is obviously a missing bracket in you for loop. You should consider to use an editor that supports brace matching to avoid those kind of errors.

ed82
  • 3,007
  • 3
  • 15
  • 11
0

I think the easiest thing to do would be to just write a function that does what your desired non-looping code does. For reference, here's the output from that code, using the dput output from the edit to your question.

> newvars1 <- paste(thresholds$varname, thresholds$threshold_1_name, sep = "_")
> newvars1 <- newvars1[!is.na(thresholds$threshold_1_value)]
> newvars2 <- paste(thresholds$varname, thresholds$threshold_2_name, sep = "_") 
> newvars2 <- newvars2[!is.na(thresholds$threshold_2_value)]
> c(newvars1, newvars2)
 [1] "varA_overweight"            "varB_possible malnutrition"
 [3] "varC_Q1"                    "varD_Q1"                   
 [5] "varE_Q1"                    "varF_Q1"                   
 [7] "varA_obese"                 "varC_Q3"                   
 [9] "varD_Q3"                    "varE_Q3"                   
[11] "varF_Q3"  

Here's what that function would look like:

unlist(lapply(1:2, function(k) {
  newvars <- paste(thresholds$varname, 
                   thresholds[[paste("threshold", k, "name", sep="_")]], sep = "_")
  newvars <- newvars[!is.na(thresholds[[paste("threshold", k, "value", sep="_")]])]
}))
# [1] "varA_overweight"            "varB_possible malnutrition"
# [3] "varC_Q1"                    "varD_Q1"                   
# [5] "varE_Q1"                    "varF_Q1"                   
# [7] "varA_obese"                 "varC_Q3"                   
# [9] "varD_Q3"                    "varE_Q3"                   
#[11] "varF_Q3"  

I tried to figure out what was going on in your loop but there was a lot in there that didn't make sense to me; here's how I'd write it if I was going to loop in that way.

variables <- NULL
for (i in 1:2) {
  valuevar <- paste("threshold", i, "value", sep = "_")
  namevar <- paste("threshold", i, "name", sep = "_")
  newvars <- c()
  for (j in 1:nrow(thresholds)) { 
    if (!is.na(thresholds[[valuevar]][j])) {
      newvars <- c(newvars, paste(thresholds$varname[j], 
                                  thresholds[[namevar]][j], sep = "_"))
    }
  }
  variables <- c(variables, newvars)
}
variables
Aaron left Stack Overflow
  • 36,704
  • 7
  • 77
  • 142