Select first non-NA value by row

Question

I have data like this:

df <- data.frame(id=c(1, 2, 3, 4), A=c(6, NA, NA, 4), B=c(3, 2, NA, NA), C=c(4, 3, 5, NA), D=c(4, 3, 1, 2))

   id A  B  C D
1  1  6  3  4 4
2  2 NA  2  3 3
3  3 NA NA  5 1
4  4  4 NA NA 2

For each row: If the row has non-NA values in column "A", I want that value to be entered into a new column 'E'. If it doesn't, I want to move on to column "B", and that value entered into E. And so on. Thus, the new column would be E = c(6, 2, 5, 4).

I wanted to use the ifelse function, but I am not quite sure how to do this.

(No need to apologize for lack of experience! Good first question, btw.) — r2evans, Feb 24 '21 at 18:08

score 1 · Accepted Answer · answered Feb 24 '21 at 18:02

tidyverse

library(dplyr)
mutate(df, E = coalesce(A, B, C, D))
#   id  A  B  C D E
# 1  1  6  3  4 4 6
# 2  2 NA  2  3 3 2
# 3  3 NA NA  5 1 5
# 4  4  4 NA NA 2 4

coalesce is effectively "return the first non-NA in each vector". It has a SQL equivalent (or it is an equivalent of SQL's COALESCE, actually).

base R

df$E <- apply(df[,-1], 1, function(z) na.omit(z)[1])
df
#   id  A  B  C D E
# 1  1  6  3  4 4 6
# 2  2 NA  2  3 3 2
# 3  3 NA NA  5 1 5
# 4  4  4 NA NA 2 4

na.omit removes all of the NA values, and [1] makes sure we always return just the first of them. The advantage of [1] over (say) head(., 1) is that head will return NULL if there are no non-NA elements, whereas .[1] will always return at least an NA (indicating to you that it was the only option).

A side-note: if OP is happy with non-`base`: `data.table::fcoalesce` takes a `data.frame`, so then it's possible to do `df$E = fcoalesce(df[-1])`, without entering all individual columns. — Henrik, Feb 24 '21 at 19:32

Select first non-NA value by row

1 Answers1

tidyverse

base R