1

I would like to create a vector based on conditions of several other vectors. The conditions are in a decreasing order of priority. Here a simple example in which I am creating the variable 'see1' which is supposed to contain different letters (but not NA). The priority for creating it is hierarchical: l1 > l2 > l3 > l4. E.g. 'see1' can only be assigned the status of 'l4' if all other conditions are NA and it will be assigned automatically the status of 'l1' if 'l1' is not NA ('l1' overrules other columns). I have used a nested ifelse to create 'see1'.

test <- data.frame(id=c("a","b","c","d","e","f"),
               l1=c(NA,NA,"A",NA,"B", NA),
               l2=c(NA,NA,"N","N",NA,NA),
               l3=c("V",NA,NA,NA,"V","V"), 
               l4=c("H","H",NA,NA,rep("H",2)), stringsAsFactors=F)
test$see1 <- ifelse(test$l1%in%c("A", "B"), test$l1,
               ifelse(test$l2%in%"N", "N", 
                  ifelse(test$l3%in%"V", "V",
                        ifelse(test$l4%in%"H","H", NA))))
test

id   l1   l2   l3   l4 see1
1  a <NA> <NA>    V    H    V
2  b <NA> <NA> <NA>    H    H
3  c    A    N <NA> <NA>    A
4  d <NA>    N <NA> <NA>    N
5  e    B <NA>    V    H    B
6  f <NA> <NA>    V    H    V

However, with many conditions/columns, this task becomes cumbersome. I have scanned similar questions about 'nested ifelse' but did not encounter this problem.

Tomiris
  • 69
  • 5
  • 1
    With this example in particular (replacing NA's with the first non-NA value), the operation you're looking for is called `coalesce` in SQL. [Here's a nice question on implementing it in R](http://stackoverflow.com/q/19253820/903061). – Gregor Thomas Mar 18 '15 at 16:17
  • Thanks, Gregor! That helped! I have posted the solution below. – Tomiris Apr 20 '15 at 13:31

2 Answers2

2

You could try max.col with ties.method='first' on the l\\d' columns to create the column index.cbindwith1:nrow(test)` extract the elements from the subset of 'test' dataset based on the row/column index.

nm <- grep('^l\\d+', names(test))
test[nm][cbind(1:nrow(test), max.col(!is.na(test[nm]), 'first'))]
#[1] "V" "H" "A" "N" "B" "V"

Or some options using apply

 apply(test[nm], 1, function(x) x[Position(function(y) !is.na(y), x)])
 #[1] "V" "H" "A" "N" "B" "V"

  apply(test[nm], 1, function(x) x[!is.na(x)][1])
  #[1] "V" "H" "A" "N" "B" "V"
akrun
  • 874,273
  • 37
  • 540
  • 662
2

Here is the coalesce-solution:

First, reorder columns in test (not necessary in my example as columns were ordered but might be important in other cases)

require(dplyr)
require(magrittr) # for piping
test %<>% select(l1,l2, l3, l4)

Now use coalesce function

coalesce2 <- function(...){
 Reduce(function(x,y) {
    i<-which(is.na(x))
    x[i]<-y[i]
    x},
    list(...))
}

test$see1 <- coalesce2(test$l1,test$l2, test$l3, test$l4)
test

or (again) with the help of the magrittr package

require(magrittr)
test$see1 <- test%$% coalesce2(l1,l2, l3, l4)
test

>    l1   l2   l3   l4 see1
>1 <NA> <NA>    V    H    V
>2 <NA> <NA> <NA>    H    H
>3    A    N <NA> <NA>    A
>4 <NA>    N <NA> <NA>    N
>5    B <NA>    V    H    B
>6 <NA> <NA>    V    H    V
Tomiris
  • 69
  • 5