The first half of this answer is expanding and trying to explain @Joran's excellent comment/answer, which is mainly an exercise for me and my understanding, but hopefully it helps someone else too. (and I'm happy to have my understanding corrected).
The second half shows a couple of other non-base solutions that could be used in more complex situations.
Joran's answer
c('not ok','ok')[(is.numeric(df[[1]]) & (df[[2]] != 'b')) + 1]
From ?data.frame
A data frame is a list of variables
so, each column/variable in the data.frame is a list
From ?[
and this question on the difference between [
and [[
we note that
For lists, one generally uses [[ to select any single element, whereas [ returns a list of the selected elements.
Therefore, using [[
in this solution selects a single element of the the list
df[[1]] ## select the 1st column as a single element (which is a vector)
# [1] 0 1 0 1
df[[2]] ## select the 2nd column as a single element (which is a vector)
# [1] a b c d
## note that df[1] would return the first column as a data.frame (which is a list), not a vector
## we can see that by
# > str(df[1])
# 'data.frame': 4 obs. of 1 variable:
# $ a: num 0 1 0 1
# > str(df[[1]])
# num [1:4] 0 1 0 1
With these two vectors now selected we can perform the vectorised logical check on each element within them
is.numeric(df[[1]]) & (df[[2]] != 'b')
# TRUE FALSE TRUE TRUE
From ?logical
we have
...with TRUE being mapped to 1L, FALSE to 0L...
so essentially TRUE == 1L
and FALSE == 0L
, which we can see by
sum(c(TRUE, TRUE, FALSE, TRUE))
# [1] 3
Now, taking a vector of our choices
c("not ok", "ok")
# [1] "not ok" "ok"
we can use [
again to select each element
c("not ok", "ok")[1]
# [1] "not ok"
c("not ok", "ok")[2]
# [1] "ok"
c("not ok", "ok")[3]
# [1] NA
## Because there isn't a 3rd element
c("not ok", "ok")[0]
# character(0) ## empty
## and we can use a vector to select each element
c("not ok", "ok")[c(1,2,1,3)]
# [1] "not ok" "ok" "not ok" NA
Which also means we can use our logical comparison from earlier to subset the choices. However, as FALSE
is mapped to 0L, we need to add 1 to it so it will be able to select from the vector
c(TRUE, TRUE, FALSE, TRUE) + 1
# [1] 2 2 1 2
which gives
c("not ok", "ok")[c(2,2,1,2)]
# [1] "ok" "ok" "not ok" "ok"
Which now gives us the information we want to include in our original data.frame
df$c <- c("not ok", "ok")[c(2,2,1,2)]
# a b c
# 1 0 a ok
# 2 1 b ok
# 3 0 c not ok
# 4 1 d ok
Non-base solutions
## a dplyr version, still using ifelse construct
library(dplyr)
df %>%
mutate(c = ifelse(is.numeric(a) & b != "b", "ok", "not ok"))
## a couiple of data.table versions using by reference udpates (:=)
library(data.table)
## using an ifelse
setDT(df)[, c := ifelse(is.numeric(a) & b != "b", "ok", "not ok")]
## using filters in i
setDT(df)[is.numeric(a) & b != "b", c := "ok"][is.na(c), c := "not ok"]