0

I have a dataset like the generic dataset below and wish to join the three sitevariables into one.

 df <- data.frame(var1 = c("site1", NA, NA, NA, "site3", "site4"),
                  var2 = c(NA, NA, "NA", "site2", NA, "site4"),
                  var3 = c("site1", NA, "Site2", "site2", "site3", NA))

   var1  var2  var3
1 site1  <NA> site1
2  <NA>  <NA>  <NA>
3  <NA>    NA Site2
4  <NA> site2 site2
5 site3  <NA> site3
6 site4 site4  <NA>

The data frame I wish to obtain should thus be like this:

       var1  var2  var3  var4
1 site1  <NA> site1 site1
2  <NA>  <NA>  <NA>  <NA>
3  <NA>    NA Site2 site2
4  <NA> site2 site2 site2
5 site3  <NA> site3 site3
6 site4 site4  <NA> site4

I have tried using coalesce in the sqldf package, which gives the correct output, but str() reveals that the output is a data frame with one variable. and not a string of factors as I need.

library(sqldf)
df$var4 <-sqldf("select coalesce(var1, var2, var3) from df")
> df
   var1  var2  var3 coalesce(var1, var2, var3)
1 site1  <NA> site1                      site1
2  <NA>  <NA>  <NA>                       <NA>
3  <NA>    NA Site2                         NA
4  <NA> site2 site2                      site2
5 site3  <NA> site3                      site3
6 site4 site4  <NA>                      site4
> str(df)
'data.frame':   6 obs. of  4 variables:
 $ var1: Factor w/ 3 levels "site1","site3",..: 1 NA NA NA 2 3
 $ var2: Factor w/ 3 levels "NA","site2","site4": NA NA 1 2 NA 3
 $ var3: Factor w/ 4 levels "site1","site2",..: 1 NA 3 2 4 NA
 $ var4:'data.frame':   6 obs. of  1 variable:
  ..$ coalesce(var1, var2, var3): chr  "site1" NA "NA" "site2" ...
Warning message:
zx8754
  • 52,746
  • 12
  • 114
  • 209
Thoegernh
  • 85
  • 1
  • 5
  • Relevant post: http://stackoverflow.com/questions/19253820/how-to-implement-coalesce-efficiently-in-r – zx8754 Jun 06 '16 at 09:41

5 Answers5

2

You can use na.locf from zoo package to get var4,

library(zoo)
df$var4 <- na.locf(t(df))[ncol(df),]
df
#   var1  var2  var3  var4
#1 site1  <NA> site1 site1
#2  <NA>  <NA>  <NA>  <NA>
#3  <NA>    NA Site2 Site2
#4  <NA> site2 site2 site2
#5 site3  <NA> site3 site3
#6 site4 site4  <NA> site4
Sotos
  • 51,121
  • 6
  • 32
  • 66
2

Another option is to use something like this:

df$var4 <- apply(df, 1, min, na.rm = TRUE)

Output is as follows:

   var1  var2  var3  var4
1 site1  <NA> site1 site1
2  <NA>  <NA>  <NA>  <NA>
3  <NA>  <NA> Site2 Site2
4  <NA> site2 site2 site2
5 site3  <NA> site3 site3
6 site4 site4  <NA> site4
Gopala
  • 10,363
  • 7
  • 45
  • 77
1

You can use cbind() to merge the two data frame like:

tmp = sqldf("select coalesce(var1, var2, var3) from df")
df = cbind(df, tmp)

then change the colname by

colnames(df)[4] = 'var4'
Yang
  • 26
  • 4
1

Maybe this might be a possible solution :

df<- data.frame(var1 = c("site1", NA, NA, NA, "site3", "site4"),
                var2 = c(NA, NA, "NA", "site2", NA, "site4"),
                var3 = c("site1", NA, "Site2", "site2", "site3", NA))

getLastNonNA <- function(v){
  notNAs <- !is.na(v)
  if(!any(notNAs))
    return(NA)
  tail(v[notNAs],1)
}

# if you prefer the first non-NA
#getFirstNonNA <- function(v){
#  notNAs <- !is.na(v)
#  if(!any(notNAs))
#    return(NA)
#  head(v[notNAs],1)
#}

df$var4 <- apply(df,1,getLastNonNA)

> df
   var1  var2  var3  var4
1 site1  <NA> site1 site1
2  <NA>  <NA>  <NA>  <NA>
3  <NA>    NA Site2 Site2
4  <NA> site2 site2 site2
5 site3  <NA> site3 site3
6 site4 site4  <NA> site4

Note that the last column is a characters vector, but you can easily convert to factor if you want (using as.factor).

digEmAll
  • 56,430
  • 9
  • 115
  • 140
1

We can use max.col

df$var4 <- df[cbind(1:nrow(df), max.col(!is.na(df), "first"))]
df
#   var1  var2  var3  var4
#1 site1  <NA> site1 site1
#2  <NA>  <NA>  <NA>  <NA>
#3  <NA>  <NA> Site2 Site2
#4  <NA> site2 site2 site2
#5 site3  <NA> site3 site3
#6 site4 site4  <NA> site4
akrun
  • 874,273
  • 37
  • 540
  • 662