5

I am using R and have searched around for an answer but while I have seen similar questions, it has not worked for my specific problem.

In my data set I am trying to use the NA's as placeholders because I am going to return to them once I get part of my analysis done so therefore, I would like to be able to do all my calculations as if the NA's weren't really there.

Here's my issue with an example data table

ROCA = c(1,3,6,2,1,NA,2,NA,1,NA,4,NA)
ROCA <- data.frame (ROCA=ROCA)       # converting it just because that is the format of my original data

#Now my function
exceedes <- function (L=NULL, R=NULL, na.rm = T)
 {
    if (is.null(L) | is.null(R)) {
        print ("mycols: invalid L,R.")
        return (NULL)               
    }
    test <-(mean(L, na.rm=TRUE)-R*sd(L,na.rm=TRUE))
  test1 <- sapply(L,function(x) if((x)> test){1} else {0})
  return (test1)
}
L=ROCA[,1]
R=.5
ROCA$newcolumn <- exceedes(L,R)
names(ROCA)[names(ROCA)=="newcolumn"]="Exceedes1"

I am getting the error:

Error in if ((x) > test) { : missing value where TRUE/FALSE needed 

As you guys know, it is something wrong with the sapply function. Any ideas on how to ignore those NA's? I would try na.omit if I could get it to insert all the NA's right where they were before, but I am not sure how to do that.

Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
Tim
  • 367
  • 1
  • 6
  • 19
  • Why not just add another if statement into the sapply function that returns NA if x is NA? Also, if you put `browser()` anywhere in your function, it will pause at that place when you run it the next time. – Roman Luštrik Jun 27 '11 at 23:18
  • Thanks for the response! I am not sure whether I did this right however, because I am still getting the same error. Here is my code test1 <- sapply(L,function(x) if ((x) == NA) {NA} else if((x)> test){1} else {0} ) and the error is now: Error in if ((x) == NA) { : missing value where TRUE/FALSE needed – Tim Jun 27 '11 at 23:30
  • You must use `is.na(x)` to check it. `x == NA` returns NA... – Tommy Jun 27 '11 at 23:46

3 Answers3

5

There's no need for sapply and your anonymous function because > is already vectorized.

It also seems really odd to specify default argument values that are invalid. My guess is that you're using that as a kludge instead of using the missing function. It's also good practice to throw an error rather than return NULL because you would still have to try to catch when the function returns NULL.

exceedes <- function (L, R, na.rm=TRUE)
{
  if(missing(L) || missing(R)) {
    stop("L and R must be provided")
  }
  test <- mean(L,na.rm=TRUE)-R*sd(L,na.rm=TRUE)
  as.numeric(L > test)
}

ROCA <- data.frame(ROCA=c(1,3,6,2,1,NA,2,NA,1,NA,4,NA))
ROCA$Exceeds1 <- exceedes(ROCA[,1],0.5)
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
  • The advantage of using null is that it's always easy to explicitly pass in. In some situations generating "missing" arguments is a pain. – hadley Jun 28 '11 at 12:39
  • @hadley: I agree (that's how `plot.default` handles several arguments) but I was referring to this specific situation where `NULL` argument values are invalid. – Joshua Ulrich Jun 28 '11 at 13:33
4

This statement is strange:

test1 <- sapply(L,function(x) if((x)> test){1} else {0})

Try:

test1 <- ifelse(is.na(L), NA, ifelse(L > test, 1, 0))
jimmyb
  • 4,227
  • 2
  • 23
  • 26
  • Can't thank both of you enough. Really appreciate the quick feedback! – Tim Jun 27 '11 at 23:41
  • I'm not sure if it's appropriate to ask a separate, but related question again here. Thanks to everyone's help I wanted to make one small tweak. Certain parts of my data has blanks, I want to specify that if two columns each have blanks, than columns 5 through 10 will have the value NA. The code that I have tried to use is this. I surely need to review my if statements. a<- if(a[,10]&a[,11]=="" is.na(a[,5:10]) I get Error: unexpected symbol in "a<- if(a[,10]&a[,11]=="" is.na" – Tim Jun 28 '11 at 00:11
  • 'if' is a control structure. You probably want 'ifelse' which returns a vector. – IRTFM Jun 28 '11 at 02:14
  • I think you need something like `a[a[,10]=="" & a[,11]=="",5:10] <- NA`. `is.na` is for testing that a variable is NA, not for setting it to NA. – Ben Bolker Jun 28 '11 at 02:24
2

Do you want NA:s in the result? That is, do you want the rows to line up?

seems like just returning L > test would work then. And adding the column can be simplified too (I suspect "Exeedes1" is in a variable somewhere).

exceedes <- function (L=NULL, R=NULL, na.rm = T)
 {
    if (is.null(L) | is.null(R)) {
        print ("mycols: invalid L,R.")
        return (NULL)               
    }
    test <-(mean(L, na.rm=TRUE)-R*sd(L,na.rm=TRUE))

    L > test
}
L=ROCA[,1]
R=.5
ROCA[["Exceedes1"]] <- exceedes(L,R)
Tommy
  • 39,997
  • 12
  • 90
  • 85