Up front: I think the use of ifelse
statements in this problem is strongly ill-advised. It requires significant nesting, sacrificing performance and readability. Though these two solutions may be a little harder if you aren't familiar with mapply
or table-join-calculus, the payoff in stability and performance will far outweigh the time to learn these techniques.
Two methods:
Lookup matrix
One way is to define look-up arrays, where the row names reflect the possible V1
values, and the column names reflect the possible V2
values. (Note that when referencing these lookup matrices, one must use as.character
if your values are numeric/integer, since otherwise they will look for the slice/row number, not the specific matching column/row.)
Examples:
dat <- data.frame(
V1 = c(0,0,0,1,1,1,2,2,2),
V2 = c(0,1,2,0,1,2,0,1,2)
)
dmnms <- list(c(0,1,2), c(0,1,2))
m3 <- matrix(c(0, 1, 2,
0, NA, 1,
0, 0, 0),
nrow = 3, byrow = TRUE, dimnames = dmnms)
m4 <- matrix(c("AA", "AD", "DD",
"AB", NA, "CD",
"BB", "BC", "CC"),
nrow = 3, byrow = TRUE, dimnames = dmnms)
m3
# 0 1 2
# 0 0 1 2
# 1 0 NA 1
# 2 0 0 0
m4
# 0 1 2
# 0 "AA" "AD" "DD"
# 1 "AB" NA "CD"
# 2 "BB" "BC" "CC"
in this case, notice the 0
, 1
, and 2
in the row/column margins. In a matrix with no names, these are typically [1,]
, [2,]
, etc, indicating that actual names are not available, instead reflecting just the row number. However, since these are character
(no brackets/commas), they can be referenced directly, ala
m3["0","2"]
# [1] 2
m4["1","0"]
# [1] "AB"
From here, you just need to map these lookups into new columns, something like:
dat$V3 <- mapply(`[`, list(m3), as.character(dat$V1), as.character(dat$V2))
dat$V4 <- mapply(`[`, list(m4), as.character(dat$V1), as.character(dat$V2))
dat
# V1 V2 V3 V4
# 1 0 0 0 AA
# 2 0 1 1 AD
# 3 0 2 2 DD
# 4 1 0 0 AB
# 5 1 1 NA <NA>
# 6 1 2 1 CD
# 7 2 0 0 BB
# 8 2 1 0 BC
# 9 2 2 0 CC
Joining data.frame
Another method is to join a known data.frame
onto your data. This has an added benefit of easily expanding to more than two criteria. (Technically, the matrix
method can expand to more than 2, in which case it would be an n
-dim array
, but it is often a little harder to edit, manage, and visualize.)
In your example, this doesn't initially gain you much, since you need to pre-define your data.frame, but I'm guessing that this is just representative data, and your conditional classification is on much more data.
I'll define the joiner data.frame
that will be used against your actual data. This is the reference data, from which all input permutations will be defined into the respective V3
and V4
values.
joiner <- data.frame(
V1 = c(0,0,0,1,1,1,2,2,2),
V2 = c(0,1,2,0,1,2,0,1,2),
V3 = c(0, 1, 2, 0, NA, 1, 0, 0, 0),
V4 = c("AA", "AD", "DD", "AB", NA, "CD", "BB", "BC", "CC"),
stringsAsFactors = FALSE
)
I'll create a sample second data to demonstrate the merge:
dat2 <- data.frame(
V1 = c(2, 0, 1, 0),
V2 = c(0, 1, 2, 2)
)
merge(dat2, joiner, by = c("V1", "V2"))
# V1 V2 V3 V4
# 1 0 1 1 AD
# 2 0 2 2 DD
# 3 1 2 1 CD
# 4 2 0 0 BB
Edit: if you are concerned about dropping rows, then add all.x=TRUE
to the merge
command. If (as you saw based on your comment) you use all=TRUE
, this is a full join in SQL parlance, meaning it will keep all rows from both tables, even if there is not a match made. This may be better explained by referencing this answer and noting that I'm suggesting a left join with all.x
, keeping all on the left (first argument), only merging in rows on the right where a match is made.
(Note: this can also be done quite easily using dplyr
and data.table
packages.)