1

I have this code to add new columns in a data frame :

for(i in 1:length(listParms))
{
   parm = as.character(listParms[i])
   lParm = paste0(parm,"_LOG")
   dataSubset[,lParm] = apply(dataSubset,1, function(row){
                       if(parm %in% names(dataSubset)){
                           if(grep("0",row[parm],fixed=T) >= 0) 0
                           else NA
                       }
                      else NA
                      })
 }

listParms is a list of new columns to be added to dataSubset data.frame.

I am getting below error :

Error in if (grep("0", row[parm], fixed = T) >= 0) 0 : 
    argument is of length zero

listParms contains something like : "PARM1","PARM2", "PARM3", "PARM4", "PARM5" dataSubset is a data.frame like :

MATERIAL     TEST_SEQ    PARM1     PARM2     PARM3     PARM4     PARM5
Math             1        0001      0010      0100                0000  
Math             2        1100      1110      1111      1200      0200 
Math             3        2211                1022      2112      1202
Science          1        1112      0111      0110      0011      2001
Science          2        0122      2111      1222      0022      2010

Desire Output:

MATERIAL     TEST_SEQ    PARM1     PARM2     PARM3     PARM4     PARM5   PARM1_LOG    PARM2_LOG     PARM3_LOG     PARM4_LOG     PARM5_LOG
Math             1        0001      0010      0100                0000      0            0             0              NA             0
Math             2        1100      1110      1111      1200      0200      0            0             NA             0              0
Math             3        2211                1022      2112      1202      NA           NA            0              NA             0    
Science          1        1112      0111      0110      0011      2001      NA            0             0              0              0
Science          2        0122      2111      1222      0022      2010      0            NA            NA              0              0

Can anyone help me understand why? Thank you.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
Ianthe
  • 5,559
  • 21
  • 57
  • 74
  • 2
    Please provide the data you are operating on, your example is not [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – tonytonov Feb 19 '14 at 06:13
  • 2
    Initial guess: `row[parm]` may be something like `numeric(0)`. – tonytonov Feb 19 '14 at 06:14
  • @tonytonv : row[parm] can be empty but mostly has values like "0002111" – Ianthe Feb 20 '14 at 01:42
  • Change your `grep` to `ifelse(grepl("0", x[parm], fixed=T), 0, NA)`. Note `grepl`, not `grep`. As @tonytonov suggested, if you use `grep` to find something in an empty string, you will get, e.g., `integer(0)`. – jbaums Feb 20 '14 at 03:18
  • Why do you have `1` in your desired output? Under what conditions would you expect the result to be `1`? Your `if` statements only ever return `0` or `NA`. – jbaums Feb 20 '14 at 03:23
  • @jbaums Sorry, thats is a typo, should be NA instead of 1 – Ianthe Feb 20 '14 at 06:03

1 Answers1

2

When you use grep to find a pattern in an empty string, you will get integer(0). Instead of using grep, use grepl, which returns a logical, and takes the value FALSE if the pattern is not found in the string whether or not the string is empty.

Reproducing your data:

d <- read.table(text='MATERIAL     TEST_SEQ    PARM1     PARM2     PARM3     PARM4     PARM5
Math             1        0001      0010      0100      NA        0000  
Math             2        1100      1110      1111      1200      0200 
Math             3        2211      NA        1022      2112      1202
Science          1        1112      0111      0110      0011      2001
Science          2        0122      2111      1222      0022      2010', 
                header=T, colClasses='character')

d[is.na(d)] <- ''

Solving your problem:

listParms <- paste0('PARM', 1:5)

for(i in 1:length(listParms)) {
  parm <- as.character(listParms[i])
  lParm <- paste0(parm,"_LOG")
  d[, lParm] <- apply(d, 1, function(x){
    if(parm %in% names(d)) {
      ifelse(grepl("0", x[parm], fixed=T), 0, NA)
    } else {
      NA
    }
  })
}

For kicks, here's an alternative, vectorized approach to creating the new columns, which could then be cbinded to the original data.frame:

listParmsSub <- listParms[listParms %in% names(d)]
ifelse(do.call(cbind, 
        setNames(lapply(d[, listParmsSub], function(x) {
          grepl(0, x)
        }), paste0(names(d[, listParmsSub]), '_LOG'))), 
       0, NA)

To extend this to allow multiple conditions, you could use nested ifelse statements, e.g.:

ifelse(do.call(cbind, 
               setNames(lapply(d[, listParmsSub], function(x) {
                 sapply(x, function(x) ifelse(x=='', NA, 
                    ifelse(grepl(0, x), 0, 
                      ifelse(grepl(4, x), NA, 
                        ifelse(grepl(59, x), 0, 1)))))
               }), paste0(names(d[, listParmsSub]), '_LOG'))), 
       0, NA)
jbaums
  • 27,115
  • 5
  • 79
  • 119
  • Thanks alot jbaums :) in the vectorized approach, is it application if there are more than 1 if? like if ... then 0 else if ... then 1 else if ... then 2 else NA? – Ianthe Feb 20 '14 at 06:08
  • It would require modification, but yes it's possible. What are your additional conditions? – jbaums Feb 20 '14 at 07:19
  • Here is the all condition : if row[parm] contains empty then NA else if row[parm] contains 0 then 0 else if row[parm] contains 4 then NA else if row[[parm] contains '04' then 0 else if row[parm] contains '59' then 0 else 1 – Ianthe Feb 20 '14 at 08:03
  • I noticed that your condition `if row[[parm] contains '04' then 0` is redundant, since you first test whether the string includes `0` and then if it includes `4`. – jbaums Feb 20 '14 at 23:50
  • 1
    I've edited the answer to include some code to check all your conditions. – jbaums Feb 21 '14 at 00:47
  • Thanks jbaums, thats great! save me lots of time. :) – Ianthe Feb 21 '14 at 01:43