I want to conditionally generate output in a column based on values in two other columns using data.table. This is something I want to achieve using a function containing 'if' statements specifically, as I'm just trying to learn how to use data.table. The if statement contains shift()
, and I think that this may be the cause of my problems.
What I have tried so far is the following:
library(data.table)
#Data
DT <- data.table(V1 = sample(LETTERS[1:3], 20, replace = TRUE),
V2 = sample(1:5, 20, replace = TRUE))
#function
fun1 <- function(x, y){
if(x == "C" & shift(y, type = "lead") > y){
return("Greater")
} else if(x == "C" & shift(y, type = "lead") < y){
return("Lesser")
} else{
return(NA)
}
}
#function implementation
DT.v1 <- DT[, V3 := mapply(fun1, x = V1, y = V2)]
If I run the above, I get the error:
Error in if (x == "C" & shift(y, type = "lead") > y) { :
missing value where TRUE/FALSE needed
I had an inkling that this error may be caused by comparison with NA
in the last iteration as shift(y, type = "lead")
would equal NA
. Addition of the condition !is.na(shift(y, type = "lead"))
in the 'if' statement did stop the error from appearing but leads to generation of only NULL
values.
I have been able to generate my desired output (see script below) but would also like to learn how i can achieve this using 'if' statements in a function.
DT.v2 <- DT[V1 == "C" & shift(V2, type = "lead") > V2, V3 := "Greater"][
V1 == "C" & shift(V2, type = "lead") < V2, V3 := "Lesser"]
#or an alternative way to generate the desired output:
DT.v3 <- DT[, V3 := ifelse(V1 == "C" & shift(V2, type = "lead") > V2, "Greater",
ifelse(V1 == "C" & shift(V2, type = "lead") < V2, "Lesser", NA))]
Can someone help me understand how to implement the function in a correct way? Thanks for taking the time to help!