0

I'm new to R. I'm trying to set a new column in my data frame depending on what's in 3 other columns. I've looked at other queries like:

Populate a column using if statements in r

Which I thought would solve it but it looks like I can only give sapply a single vector as when I try the following code:

IHC <- c("N","N","Y","N","N")
CCD <- c("13-Nov-2009", NA, "09-Feb-2011", "10-Dec-2012", "16-Nov-2009")
IHE <- c(NA, "20-Feb-2011",NA,NA,NA)
df1 <- data.frame(IHC, CCD, IHE)

InHouse <- function(IHC,CCD,IHE) {
  if(IHE == "" &&  CCD == NA | IHC == "N") y <- ""
  if(IHE == "") y <- CCD
  if(CCD > IHE) y <- IHE
  else y <- CCD
  return(y)
}

df1$AAA <- sapply(c(df1$IHC, df1$CCD, df1$IHE), InHouse)

I get the following error:

Error in IHE == "" : 'IHE' is missing

Any help would be great.

Community
  • 1
  • 1
  • have a quick look at ?is.na. Also a few posts on SO, http://stackoverflow.com/questions/20614735/double-if-conditioning-in-r-language-syntax/20615066#20615066 – user20650 Mar 07 '14 at 18:06
  • Can you please describe in words what you want to achieve and show the expected output? Also, you don't want `if` and `sapply`. You want vectorized functions like `ifelse` or logical subsetting. – Roland Mar 07 '14 at 18:52
  • I quess you're trying to do something like `mapply(InHouse, df1$IHC, df1$CCD, df1$IHE)`. But there are significantly wrong parts in the code in general that come from the fact that `== NA` or `NA ==` produce `NA`s, which -inside `if` statements- produce errors. – alexis_laz Mar 07 '14 at 19:08

1 Answers1

1

There are several issues.

  1. Your conditions involve comparisons like: IHE=="". IHE is NA but never "". So I assume you want is.na(IHE)??
  2. You are mixing the scalar form of and (&& instead of &) with the vectorized form of or (| instead of ||). Why??
  3. The comparison CCD > IHE is meaningless if either is NA (which is always the case).
  4. The logical operators & and | have equal precedence, so IHE == "" && CCD == NA | IHC == "N" is equivalent to (IHE == "" && CCD == NA) | IHC == "N". Is that what you want??
  5. Most important, your condition are not mutually exclusive.

This is a way to apply the conditions without the use of any of the apply(...) functions.

df1 <- data.frame(IHC, CCD, IHE, stringsAsFactors=F)
df1$AAA <- CCD
cond <- with(df1,is.na(IHE) & is.na(CCD) | IHC == "N")
df1[cond,]$AAA <- ""
cond <- is.na(df1$IHE)
df1[cond,]$AAA <- df1[cond,]$CCD
cond <- with(df1,CCD > IHE & is.na(CCD) & is.na(IHE))
df1[cond,]$AAA <- df1[cond,]$IHE
jlhoward
  • 58,004
  • 7
  • 97
  • 140