3

I have the below dataset that is built from

help <- data.frame(var1 = c("red", NA, NA, NA, "red", "orange"),
               var2 = c(NA, "lightred", "blue", "lightred", NA, NA))

    var1     var2
1    red     <NA>
2   <NA> lightred
3   <NA>     blue
4   <NA> lightred
5    red     <NA>
6 orange     <NA>

And I am stuck trying to make a new variable, newvar that simply merges the factor variables into one new column. I am hoping to get output as follows

    var1     var2   newvar
1    red     <NA>    red
2   <NA> lightred    lightred
3   <NA>     blue    blue
4   <NA> lightred    lightred
5    red     <NA>    red
6 orange     <NA>    orange

Here was my attempt based on other threads here

help$newvar = ifelse(help$var1 == "", help$var2, help$var1)

When the variables are factors, the newvar are numbers, which makes sense, but only variables from var1 move to newvar. When character, again only the characters from var1 move to newvar.

I know similar questions have been asked, but the answers didn't seem to fix this issue. Merge two factor columns in R

Is there a way to do this in dplyr too? I'll take whatever solution I can get, just curious.

Community
  • 1
  • 1
b222
  • 966
  • 1
  • 9
  • 19

4 Answers4

3

I don't see any blanks in your var1, I see missing values. Use is.na(help$var1) instead of help$var1 == ""

You'll also want to convert to character since you have factors with different levels.

help$newvar = ifelse(is.na(help$var1), as.character(help$var2), as.character(help$var1))

Of course you can use dplyr, you're just creating a column

help = mutate(help, newvar = ifelse(is.na(var1), as.character(var2), as.character(var1)))

In SQL, this operation is called coalesce. You might be interested in How to implement coalesce efficiently in R

Community
  • 1
  • 1
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Ok great. I see what you mean with the is.na... I was messing with both "" and NA when I built this df for the example. When I leave it as a factor, that code gives me numbers for `newvar` and when i convert to character it works well – b222 Sep 30 '15 at 20:17
  • Also thank you for the resource regarding coalescing – b222 Sep 30 '15 at 20:19
  • Thanks, edited in the character conversion. The real lesson here is that a blank is different than a missing value. – Gregor Thomas Sep 30 '15 at 20:19
  • The coalesce answer predates `tidyr`, but the other answer with `tidyr::unite` is probably more convenient. – Gregor Thomas Sep 30 '15 at 20:21
  • Why can't you just use `transform` from base R instead of `dplyr::mutate`? It's exactly the same syntax – David Arenburg Sep 30 '15 at 20:25
  • @DavidArenburg Just giving OP what OP asks for: "Is there a way to do this in dplyr too?" I hope my answer illustrates that you're just adding a column however you do it---that's why I started with `help$newvar = ...` – Gregor Thomas Sep 30 '15 at 20:37
3

If you want to use tidyr:

library(tidyr)

help[] <- lapply(help, as.character)
help[is.na(help)] <- ''
unite(help, new, var1:var2, sep='', remove=F)

#       new   var1     var2
#1      red    red         
#2 lightred        lightred
#3     blue            blue
#4 lightred        lightred
#5      red    red         
#6   orange orange      

Or a base R hack:

transform(help, new = gsub('NA','',do.call(paste0, help))

#    var1     var2      new
#1    red     <NA>      red
#2   <NA> lightred lightred
#3   <NA>     blue     blue
#4   <NA> lightred lightred
#5    red     <NA>      red
#6 orange     <NA>   orange   
Colonel Beauvel
  • 30,423
  • 11
  • 47
  • 87
2

Here's another way of just subsetting the non- NAs values out of help

t(help)[t(!is.na(help))]
## [1] "red"      "lightred" "blue"     "lightred" "red"      "orange"  

The need in t is because the subsetting in R is been done by column instead by row

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
1

In this case the sqldf package comes in handy to use, as Gregor pointed out, the function coalesce:

 library(sqldf)
 help$newvar <- sqldf("select coalesce(var1, var2) newvar from help")

Output:

    var1     var2   newvar
1    red     <NA>      red
2   <NA> lightred lightred
3   <NA>     blue     blue
4   <NA> lightred lightred
5    red     <NA>      red
6 orange     <NA>   orange
mpalanco
  • 12,960
  • 2
  • 59
  • 67