1

My goal is to categorize the rows on my dataset depending on the values of two different dates.

if(!exists(MY_DATA$Date_1) & exists(MY_DATA$Date_2)) {
  MY_DATA$NEW_COL <- c("Category_1")
} else {
  MY_DATA$NEW_COL <- c("Category_2")
}

But it isn't working, I'm currently trying a simplified version as follows:

if(!exists(MY_DATA$Date_1)){
  MY_DATA$NEW_COL <- c("Category_1")
}

However, it seems that this only reads the value on the first row, and it either gives me a column with all values as Category_1 or no column at all.

Also I have tried this with is.na(), is.null() and exists().

Gorka
  • 1,971
  • 1
  • 13
  • 28
JC Cantu
  • 21
  • 2
  • Welcome to SO! Please include an example of your data with `dput` for a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – starja Jul 16 '20 at 18:08
  • `if` is not vecotrized like you are using it. You want the `ifelse` function: `MY_DATA$NEW_COL <- ifelse(!exists(MY_DATA$Date_1),"Category_1","Category_2")` – Daniel O Jul 16 '20 at 18:17
  • `exists(MY_DATA$Date_1)` is only meaningful if: `MY_DATA` is exactly one row; `Date_1` contains strings; and those strings point to variables in the local environment or within the search path. Otherwise, perhaps you need `"Date_1" %in% names(MY_DATA)`. – r2evans Jul 16 '20 at 18:24
  • Also, the only time it is appropriate to use `&` (single) in an `if` clause is if you wrap it in some aggregating function like `any` or `all`, otherwise it *might* work as you need but can very easily fail. Why? `&` returns a vector of length 0 or more of logical, whereas `if` **requires** length exactly 1. – r2evans Jul 16 '20 at 18:25

2 Answers2

0

However, it seems that this only reads the value on the first row, and it either gives me a column with all values as Category_1 or no column at all.

This is because if statement requires a vector of length 1. When given a vector with length more than 1, it will only read the first member to make the decision TRUE or FALSE.

The ifelse function can accept vector argument and will return a vector of logical TRUE/FALSE. It may be suitable for your needs.


Rephrasing originally a comment by @r2evans, the use of exists() is to check if a variable is already defined in the R environment. exists() takes a character vector of length 1 as argument, otherwise it will check only the first member.

a = 1
b = 1
exists("a")
[1] TRUE

exists(c("a", "b"))
[1] TRUE

exists(c("ab", "a", "b"))
[1] FALSE

However it's worth noting that exists() does not check if a value is inside a vector. If you are trying to check if a value is in a vector, you'll want operator %in% instead.


The solution will largely depend on your precise implementations.

p.s. This is originally intended as a comment, but is too long as a comment.

Nuclear03020704
  • 549
  • 9
  • 22
0

Thanks everyone for your support, ifelse did the trick.

The following worked for me:

   MY_DATA$NEW_COL  <- c("Category_2")
   MY_DATA$NEW_COL  <- ifelse(!is.na(MY_DATA$Date_1),"Category_1","Category_2")
JC Cantu
  • 21
  • 2