Coalesce two string columns with alternating missing values to one

Question

I have a data frame with two columns "a" and "b" with alternating missing values (NA)

a      b
dog    <NA>
mouse  <NA>
<NA>   cat
bird   <NA>

I want to "merge" / combine them to a new column c that looks like this, i.e. the non-NA element in each row is selected:

c
dog
mouse
cat
bird

I tried merge and join, but neither worked as I wanted. Maybe because I do not have an id with which to merge? For integers I would just circumvent this and add both columns, but how in my case?

Are those real `NA` values or fake? – David Arenburg Jan 08 '15 at 22:07 — David Arenburg, Jan 08 '15 at 22:07

score 12 · Accepted Answer · edited May 23 '17 at 11:52

12

You may try pmax

df$c <- pmax(df$a, df$b)
df
#       a    b     c
# 1   dog <NA>   dog
# 2 mouse <NA> mouse
# 3  <NA>  cat   cat
# 4  bird <NA>  bird

...or ifelse:

df$c <- ifelse(is.na(df$a), df$b, df$a)

For more general solutions in cases with more than two columns, you find several ways to implement coalesce in R here.

edited May 23 '17 at 11:52

Community

1
1

answered Jan 08 '15 at 22:25

Henrik

65,555
14
143
159

1

the best solution for me was the second option using `ifelse`. Thanks – Darwin PC Apr 16 '15 at 05:46

score 10 · Answer 2 · edited Jan 24 '23 at 12:06

10

dplyr has exactly what you are looking for, function coalesce()

library(dplyr)

a<-c("dog","mouse",NA,"bird")
b<-c(NA,NA,"cat",NA)

coalesce(a,b)

[1] "dog"   "mouse" "cat"   "bird"

edited Jan 24 '23 at 12:06

Julian

6,586
2
9
33

answered Aug 28 '18 at 15:35

Konstantin Mingoulin

161
1
8

score 9 · Answer 3 · answered Jan 08 '15 at 22:15

I wrote a coalesce() function for this type of task which works much like the SQL coalesce function. You would use it like

dd<-read.table(text="a      b
dog    NA
mouse  NA
NA   cat
bird   NA", header=T)

dd$c <- with(dd, coalesce(a,b))
dd
#       a    b     c
# 1   dog <NA>   dog
# 2 mouse <NA> mouse
# 3  <NA>  cat   cat
# 4  bird <NA>  bird

David Arenburg · Answer 4 · 2015-01-08T22:27:28.687

5

Here's my attempt (modified by @MrFlick)

df$c <- apply(df, 1, function(x) na.omit(x)[1])
df
#       a    b     c
# 1   dog <NA>   dog
# 2 mouse <NA> mouse
# 3  <NA>  cat   cat
# 4  bird <NA>  bird

edited Jan 08 '15 at 22:27

answered Jan 08 '15 at 22:17

David Arenburg

91,361
17
137
196

1

Wouldn't `apply(df, 1, function(x) na.omit(x)[1])` work just as well here, and be a bit simpler? – MrFlick Jan 08 '15 at 22:25
1

I would also use `df[which(!is.na(df), arr.ind=TRUE)]` – akrun Jan 09 '15 at 11:51
@akrun, that is very nice vectoerized approach. I would post it as your own answer – David Arenburg Jan 09 '15 at 11:51

score 5 · Answer 5 · answered Jan 09 '15 at 11:56

Another option is to use which with arr.ind=TRUE

indx <- which(!is.na(df), arr.ind=TRUE)
df$c <-  df[indx][order(indx[,1])]
df
#    a    b     c
#1   dog <NA>   dog
#2 mouse <NA> mouse
#3  <NA>  cat   cat
#4  bird <NA>  bird

Or

df$c <- df[cbind(1:nrow(df),max.col(!is.na(df)))]

LyzandeR · Answer 6 · 2015-01-10T17:16:39.790

2

You could use a simple apply :

df$c <- apply(df,1,function(x)  x[!is.na(x)]  ) 

> df
      a    b     c
1   dog <NA>   dog
2 mouse <NA> mouse
3  <NA>  cat   cat
4  bird <NA>  bird

edited Jan 10 '15 at 17:16

answered Jan 08 '15 at 22:17

LyzandeR

37,047
12
77
87

score 2 · Answer 7 · answered Jul 16 '20 at 21:52

Using if else logic:

a<-c("dog","mouse",NA,"bird")
b<-c(NA,NA,"cat",NA)

test.df <-data.frame(a,b, stringsAsFactors = FALSE)
test.df$c <- ifelse(is.na(test.df$a), test.df$b, test.df$a)

test.df

      a    b     c
1   dog <NA>   dog
2 mouse <NA> mouse
3  <NA>  cat   cat
4  bird <NA>  bird

score 0 · Answer 8 · answered Jul 03 '23 at 13:39

0

Use tidyr::unite to be safe in case of a row containing two values:

df <- df |>
 unite(c,
       c(a, b),
       remove = FALSE,
       na.rm = TRUE)

answered Jul 03 '23 at 13:39

user1

404
1
5
18

Coalesce two string columns with alternating missing values to one

8 Answers8

Linked

Related