Using ifelse statement in R dataframe to generate additional variables

Question

I've got a dataframe in R with the following possible combinations in the first two columns:

 V1| V2| V3| V4
---|---|---|---
 0 | 0 | NA| NA
---|---|---|---
 0 | 1 | NA| NA
---|---|---|---
 0 | 2 | NA| NA
---|---|---|---
 1 | 0 | NA| NA
---|---|---|---
 1 | 1 | NA| NA
---|---|---|---
 1 | 2 | NA| NA
---|---|---|---
 2 | 0 | NA| NA
---|---|---|---
 2 | 1 | NA| NA
---|---|---|---
 2 | 2 | NA| NA

I would like to generate two ifelse statements or one if possible, so that I can get these two additional columns based on the different combinations:

 V1| V2| V3| V4
---|---|---|---
 0 | 0 | 0 | AA
---|---|---|---
 0 | 1 | 1 | AD
---|---|---|---
 0 | 2 | 2 | DD
---|---|---|---
 1 | 0 | 0 | AB
---|---|---|---
 1 | 1 | NA| NA
---|---|---|---
 1 | 2 | 1 | CD
---|---|---|---
 2 | 0 | 0 | BB
---|---|---|---
 2 | 1 | 0 | BC
---|---|---|---
 2 | 2 | 0 | CC

I'm really stuck at this point and none of the options I have tried work.

If I try something like this:

DF$V3 <- if((DF$V1=2) & (DF$V2 = 2)) {DF$V3 = 0}

All values in V1 and V2 are converted into 2 and all values in V3 are converted to 0.

If I use the elseif command in the following way:

DF$V3 <- elseif((DF$V1=2) & (DF$V2 = 2)) {DF$V3 = 0}

I get Error: could not find function "elseif"

I have read several forums about nested if and elseif statements in R, but I'm not able to figure out how to get the results I want using two conditions on two different columns.

Can somebody suggest any options?

Thank you very much,

Best,

Yatrosin

It's `if (conditional) { true-block-1 } else if { true-block-2 } else { false-block }`. There is no function `elseif`, ergo your error message. Are you thinking about `ifelse`? It's a vector-centric conditional. — r2evans, Jul 18 '17 at 22:01
`else if` isn't a stand alone function in R (it can be used with a full if statement). Use `ifelse(test, yes, no)` — Mako212, Jul 18 '17 at 22:02

r2evans · Accepted Answer · 2017-07-18T23:29:46.897

Up front: I think the use of ifelse statements in this problem is strongly ill-advised. It requires significant nesting, sacrificing performance and readability. Though these two solutions may be a little harder if you aren't familiar with mapply or table-join-calculus, the payoff in stability and performance will far outweigh the time to learn these techniques.

Two methods:

Lookup matrix

One way is to define look-up arrays, where the row names reflect the possible V1 values, and the column names reflect the possible V2 values. (Note that when referencing these lookup matrices, one must use as.character if your values are numeric/integer, since otherwise they will look for the slice/row number, not the specific matching column/row.)

Examples:

dat <- data.frame(
  V1 = c(0,0,0,1,1,1,2,2,2),
  V2 = c(0,1,2,0,1,2,0,1,2)
)
dmnms <- list(c(0,1,2), c(0,1,2))
m3 <- matrix(c(0, 1, 2,
               0, NA, 1,
               0, 0, 0),
             nrow = 3, byrow = TRUE, dimnames = dmnms)
m4 <- matrix(c("AA", "AD", "DD",
               "AB", NA, "CD",
               "BB", "BC", "CC"),
             nrow = 3, byrow = TRUE, dimnames = dmnms)

m3
#   0  1 2
# 0 0  1 2
# 1 0 NA 1
# 2 0  0 0
m4
#   0    1    2   
# 0 "AA" "AD" "DD"
# 1 "AB" NA   "CD"
# 2 "BB" "BC" "CC"

in this case, notice the 0, 1, and 2 in the row/column margins. In a matrix with no names, these are typically [1,], [2,], etc, indicating that actual names are not available, instead reflecting just the row number. However, since these are character (no brackets/commas), they can be referenced directly, ala

m3["0","2"]
# [1] 2
m4["1","0"]
# [1] "AB"

From here, you just need to map these lookups into new columns, something like:

dat$V3 <- mapply(`[`, list(m3), as.character(dat$V1), as.character(dat$V2))
dat$V4 <- mapply(`[`, list(m4), as.character(dat$V1), as.character(dat$V2))
dat
#   V1 V2 V3   V4
# 1  0  0  0   AA
# 2  0  1  1   AD
# 3  0  2  2   DD
# 4  1  0  0   AB
# 5  1  1 NA <NA>
# 6  1  2  1   CD
# 7  2  0  0   BB
# 8  2  1  0   BC
# 9  2  2  0   CC

Joining data.frame

Another method is to join a known data.frame onto your data. This has an added benefit of easily expanding to more than two criteria. (Technically, the matrix method can expand to more than 2, in which case it would be an n-dim array, but it is often a little harder to edit, manage, and visualize.)

In your example, this doesn't initially gain you much, since you need to pre-define your data.frame, but I'm guessing that this is just representative data, and your conditional classification is on much more data.

I'll define the joiner data.frame that will be used against your actual data. This is the reference data, from which all input permutations will be defined into the respective V3 and V4 values.

joiner <- data.frame(
  V1 = c(0,0,0,1,1,1,2,2,2),
  V2 = c(0,1,2,0,1,2,0,1,2),
  V3 = c(0, 1, 2, 0, NA, 1, 0, 0, 0),
  V4 = c("AA", "AD", "DD", "AB", NA, "CD", "BB", "BC", "CC"),
  stringsAsFactors = FALSE
)

I'll create a sample second data to demonstrate the merge:

dat2 <- data.frame(
  V1 = c(2, 0, 1, 0),
  V2 = c(0, 1, 2, 2)
)
merge(dat2, joiner, by = c("V1", "V2"))
#   V1 V2 V3 V4
# 1  0  1  1 AD
# 2  0  2  2 DD
# 3  1  2  1 CD
# 4  2  0  0 BB

Edit: if you are concerned about dropping rows, then add all.x=TRUE to the merge command. If (as you saw based on your comment) you use all=TRUE, this is a full join in SQL parlance, meaning it will keep all rows from both tables, even if there is not a match made. This may be better explained by referencing this answer and noting that I'm suggesting a left join with all.x, keeping all on the left (first argument), only merging in rows on the right where a match is made.

(Note: this can also be done quite easily using dplyr and data.table packages.)

The first option works in the example I described here. When I tried to apply this method to my actual data I get this error though: Error in .Primitive("[")(dots[[1L]][[1L]], dots[[2L]][[28L]], dots[[3L]][[28L]]) : subscript out of bounds — Yatrosin, Jul 18 '17 at 22:34
That's because you either (a) did not correctly define the row and column names of the lookup matrices, or (b) did not use `as.character`, instead trying to index `m3` or `m4` using numeric/integer indices. I cautioned against this. In my example, `dmnms` (short for dim-names) and therefore `m3` and `m4` *must* include all possible combinations of `V1` and `V2`. — r2evans, Jul 18 '17 at 22:52
Thank you very much for your help. I would have never manage to write this code — Yatrosin, Jul 18 '17 at 23:00
There is something weird with the way I'm applying the merge function. Whenever I apply this method I get two additional observations than the original ones (647 vs 645). When I use Damian's code this does not happen and I get the actual 645 original observations with the additional columns. `merge(DF, joiner, by = c("V1", "V2"), all= TRUE)` . I just added the "all=TRUE" to avoid missing the rows with NA values. — Yatrosin, Jul 18 '17 at 23:12
Okay, good point ... try `merge(..., all.x = TRUE, all.y = FALSE)`. Take a look at the "SQL database terminology" paragraph of [`?merge`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/merge.html), with some help on [SQL join types](https://stackoverflow.com/a/20298671/3358272). If that doesn't work, [meet me on chat](https://chat.stackoverflow.com/rooms/149533/temp-help). — r2evans, Jul 18 '17 at 23:25
For the crowd, one remaining issue was resolved in chat: since `merge` changes the order of rows, it might be useful to *first* add a column to indicate the row number, perhaps `dat$row <- 1:nrow(dat)`, and then post-merge, reorder with `dat <- dat[dat$row,]`. — r2evans, Jul 19 '17 at 00:04

Damian · Answer 2 · 2017-07-18T22:54:21.423

Building on the first half of the post by @r2evans, using named vectors to do multidimensional lookups instead of matrices. The logic is the same, but I find the syntax to be easier to interpret/debug.

dat <- data.frame(
    V1 = c(0, 0, 0, 1, 1, 1, 2, 2, 2),
    V2 = c(0, 1, 2, 0, 1, 2, 0, 1, 2)
) 

# Use a named vector to store key-value pairs:
#   keys: combinations of V1 and V2 in text form
#   values: the desired result associated with each combination

V3_lookup <- c("0,0" = 0,
               "0,1" = 1,
               "0,2" = 2,
               "1,0" = 0,
               "1,1" = NA,
               "1,2" = 1,
               "2,0" = 0,
               "2,1" = 0,
               "2,2" = 0)

V4_lookup <- c("0,0" = "AA",
               "0,1" = "AD",
               "0,2" = "DD",
               "1,0" = "AB",
               "1,1" =  NA,
               "1,2" = "CD",
               "2,0" = "BB",
               "2,1" = "BC",
               "2,2" = "CC")

# Create a character vector with the inputs 
# (the combinations of V1 and V2 that actually occur in the data)

id <- paste(dat$V1, dat$V2, sep = ",")
id
[1] "0,0" "0,1" "0,2" "1,0" "1,1" "1,2" "2,0" "2,1" "2,2"


# Map these strings to the desired outputs

dat$V3 <- V3_lookup[ id ]
dat$V4 <- V4_lookup[ id ]

# Result
dat

  V1 V2 V3   V4
1  0  0  0   AA
2  0  1  1   AD
3  0  2  2   DD
4  1  0  0   AB
5  1  1 NA <NA>
6  1  2  1   CD
7  2  0  0   BB
8  2  1  0   BC
9  2  2  0   CC

Not bad, not bad ... though in my experience, the client (or data) will then show `V1` or `V2` values with commas in them, providing unreliable lookups. Contrived, certainly, but shows a hint of fragility of this otherwise elegant and easy-to-read solution. — r2evans, Jul 18 '17 at 22:53
I agree this is much easier to read. Thanks for all your help — Yatrosin, Jul 18 '17 at 23:01

score 1 · Answer 3 · answered Jul 18 '17 at 22:26

In your example you have a suite of potential outputs:

library(dplyr)
df <- data.frame(V1 = rep(0:2, each=3), V2 = rep(0:2, 3))

df %>% 
   mutate(V3 = ifelse(V1==0 & V2==1 | V1==1 & V2==2, 1, 
                  ifelse(V1==0 & V2==2, 2, 
                         ifelse(V1==1 & V2==1, NA,0))),
      V4 = ifelse(V1==0 & V2==0, 'AA',
                  ifelse(V1==0 & V2==1, 'AD',
                         ifelse(V1==0 & V2==2, 'DD',
                                ifelse(V1==1 & V2==0, 'AB',
                                       ifelse(is.na(V3), 'NA',
                                              ifelse(V1==1 & V2==2, 'CD',
                                                     ifelse(V1==2 & V2==0, 'BD',
                                                            ifelse(V1==2 & V2==1, 'BC','CC')))))))))

Without a more complete understanding of your data this is about the best I can come up with.

Mako212 · Answer 4 · 2017-07-18T22:17:05.317

0

You can use an ifelse() statement with multiple conditions:

ifelse(a == 1 & b == 2, TRUE, FALSE)

In R you need to use == to check for equality.

For example:

df <- data.frame(
+ x = c(1,1,2,2),
+ y = c(3,3,4,4))


df$z <- ifelse(df$x==1 & df$y==3, "Yes","No")

df
  x y    z
1 1 3  Yes
2 1 3  Yes
3 2 4   No
4 2 4   No

edited Jul 18 '17 at 22:17

answered Jul 18 '17 at 22:05

Mako212

6,787
1
18
37

I have tried with something similar before: `ifelse(DF$V1=2 & DF$V2 = 2, DF$V3 = 0, DF$V3 = -9)` and I always get the error: Error: unexpected '=' in "ifelse(DF$V1=" – Yatrosin Jul 18 '17 at 22:08
@Yatrosin Note my other comment: "In R you need to use `==` to check for equality." – Mako212 Jul 18 '17 at 22:12
Also, don't specify `DF$V3` in the `ifelse` statement. Instead `DF$V3 <- ifelse(test, 0, -9)` – Mako212 Jul 18 '17 at 22:13

Using ifelse statement in R dataframe to generate additional variables

4 Answers4

Lookup matrix

Joining data.frame