3

Suppose you have:

df = data.frame(a = c(1,2,NA),b = c(NA, 1,2))
> df
   a  b
1  1 NA
2  2  1
3 NA  2

and wish to create a new column c based on a. If a is missing, then use b. This works:

df %>% mutate(c= a,
              c = replace(c, is.na(a), b[is.na(a)]))

but (to me, just me?) looks clumsy (in the sense that I have to spell out is.na(a) twice). This is easier:

df %>%
   rowwise() %>% 
   mutate(c = a,
          c = replace(c, is.na(a), b]))

but it requires the extra rowwise() command, and I could imagine situatoins where sum of my mutate statesments will not work rowwise.

Am I missing a some dplyr feature that makes this (very common task?) easier?

safex
  • 2,398
  • 17
  • 40

1 Answers1

5

For this, you can use coalesce() from dplyr:

df %>%
 mutate(c = coalesce(a, b))

   a  b c
1  1 NA 1
2  2  1 2
3 NA  2 2

From the documentation:

Given a set of vectors, coalesce() finds the first non-missing value at each position.

Or if you want to apply it on the whole df:

df %>%
 mutate(c = coalesce(!!!.))
tmfmnk
  • 38,881
  • 4
  • 47
  • 67