Fill a new column from multiple columns if they exist

Question

Example data.frame:

df <- data.frame(col_1=c("A", NA, NA), col_2=c(NA, "B", NA), col_3=c(NA, NA, "C"), other_col=rep("x", 3), stringsAsFactors=F)
df
  col_1 col_2 col_3 other_col
1     A  <NA>  <NA>         x
2  <NA>     B  <NA>         x
3  <NA>  <NA>     C         x

I can create a new column new_col filled with non-NA values from the 3 columns col_1, col_2 and col_3:

df %>% 
mutate(new_col = case_when(
   !is.na(col_1) ~ col_1,
   !is.na(col_2) ~ col_2,
   !is.na(col_3) ~ col_3,
   TRUE ~ "none"))

  col_1 col_2 col_3 other_col new_col
1     A  <NA>  <NA>         x       A
2  <NA>     B  <NA>         x       B
3  <NA>  <NA>     C         x       C

However, sometimes the number of columns from which I pick the new_col value can vary.

How could I check that the columns exist before applying the previous case_when command?

The following triggers an error:

df %>% 
select(-col_3) %>% 
mutate(new_col = case_when(
   !is.null(.$col_1) & !is.na(col_1) ~ col_1,
   !is.null(.$col_2) & !is.na(col_2) ~ col_2,
   !is.null(.$col_3) & !is.na(col_3) ~ col_3, 
   TRUE ~ "none"))

Error: Problem with `mutate()` input `new_col`.
   x object 'col_3' not found
   ℹ Input `new_col` is `case_when(...)`.

@bouncyball `apply` does not seem to help sorry (see my edit) — u31889, Feb 11 '21 at 16:14
I am not sure this is a duplicate, unless those other answers can handle the case where one of the columns is missing. I had missed that part of the question at first. — , Feb 11 '21 at 16:21

dyrland · Accepted Answer · 2021-02-11T16:23:27.403

4

I like Adam's answer, but if you want to be able to combine from col_1 and col_2 (assuming they both have values), you should use unite()

library(tidyverse)
df %>%
  unite(new_col, starts_with("col"), remove = FALSE, na.rm = TRUE)

Edit to respond to: "How could I check that the columns exist before applying the previous case_when command?"

You won't need to check with this command. And if your columns to unite aren't named consistently, replace starts_with("col") with c("your_name_1", "your_name_2", "etc.")

edited Feb 11 '21 at 16:23

answered Feb 11 '21 at 16:18

dyrland

608
1
7
17

1

I might like this better. Should add though that `unite()` is from `tidyr`. – Feb 11 '21 at 16:19
Good point! Added a `library(tidyverse)` edit. – dyrland Feb 11 '21 at 16:23
1

Brilliant ! Thanks ! – u31889 Feb 11 '21 at 16:29

score 3 · Answer 2 · answered Feb 11 '21 at 16:14

You can use coalesce.

library(dplyr)

# vector of all the columns you might want
candidate_cols <- paste("col", 1:3, sep = "_")

# convert to symbol only the ones in the dataframe
check_cols <- syms(intersect(candidate_cols, names(df)))

# coalesce over the columns to check
df %>% 
  mutate(new_col = coalesce(!!!check_cols))

#  col_1 col_2 col_3 other_col new_col
#1     A  <NA>  <NA>         x       A
#2  <NA>     B  <NA>         x       B
#3  <NA>  <NA>     C         x       C

Fill a new column from multiple columns if they exist

2 Answers2