missing values filling in R

Question

I want help on R programming to fill col4 col4=col1, in case col1 is NA then col4=col2, in case col1 and col2 are NA then col4=col3

id col1 col2 col3
1   10   NA   NA
2   NA   12   NA
3   NA   NA   13
4   NA   NA   1
5   2    3    NA

Answer:

I guess, from your expected answer, you want col4 = col1 when col1 is NOT NA, right? — Felipe Alvarenga, Jan 31 '18 at 20:24
`df$col4 <- with(df, ifelse(!is.na(col1), col1, ifelse(!is.na(col2), col2, col3)))` — jogo, Jan 31 '18 at 20:25
@zx8754 Is this really a duplicate of the link provided ? The difference is that this asks how to perform a task (the role `coalesce` fills), and the link is to a question about efficient implementation of coalesce. There may be overlap but these are not quite the same. — steveb, Jan 31 '18 at 20:55
@steveb by that logic, if there is a post saying "I want smallest value for the column." is not a duplicate to an existing "How do I use min function in R" ? — zx8754, Jan 31 '18 at 21:00
If you want a more appropriate dupe (though the title sucks), try [Nested `ifelse` is the worse, what's the best?](https://stackoverflow.com/q/30088919/903061). — Gregor Thomas, Jan 31 '18 at 21:03
@zx8754 I agree with steveb. Since `coalesce` is not the only method to perform the operation, while the duplicate discusses how to use the `coalesce` function specifically. By analogy, the `min` function is not the only way to calculate the smallest value for the column, but you wouldn't post an answer about the other methods in the "How do I use min function in R" question. — acylam, Jan 31 '18 at 21:03
@zx8754 I don't think that is a correct analogy, I think, to use your example, it is more like "I want smallest value for the column." vs. "how do build an optimized min function". — steveb, Jan 31 '18 at 21:05
@steveb If the OP knew about "coalesce" functionality from any language, then this post wouldn't exist. Quick Google search would point them to the linked post. — zx8754, Jan 31 '18 at 21:19
@useR I don't understand your comment: "coalesce" is not base, it can be custom function or from dplyr package, as provided in answers in target post. — zx8754, Jan 31 '18 at 21:32
@steveb No, linked post OP knows about coalesce function from SQL and shares his attempt in R hoping to improve. Many years later, now we have this function as part of dplyr (again mentioned in target post). And this OP is saying "I am trying to do X", and the correct answer is "You are trying to do coalesce, see this post" — zx8754, Jan 31 '18 at 21:41
@zx8754 Now that I read it carefully, I probably mistaken that you (and the dup) were referring to the `coalesce` _function_, instead of _functionality_, hence my argument. As it stands, I think my solution is still valuable in that it takes advantage of explicit splicing. — acylam, Jan 31 '18 at 21:51

acylam · Accepted Answer · 2018-01-31T20:58:14.713

Easily done with coalesce from dplyr. This solution works for N number of columns:

library(dplyr)
data %>%
  mutate(col4 = coalesce(!!!data[-1]))

Result:

  id col1 col2 col3 col4
1  1   10   NA   NA   10
2  2   NA   12   NA   12
3  3   NA   NA   13   13
4  4   NA   NA    1    1
5  5    2    3   NA    2

Data:

data = read.table(text = "id col1 col2 col3
                1 10 NA NA 
                2 NA 12 NA 
                3 NA NA 13
                4 NA NA 1 
                5 2 3 NA", header = T)

Notes:

!!! shouldn't be confused with the negation operator ! (understandable confusion). It is an operator that is part of rlang, or the tidyverse (also available to dplyr) which enables explicit splicing.

What this means is that instead of inputting the entire data frame into coalesce (coalesce(data[-1])), I am separating the columns of data[-1] (or elements of the list) and have each element as an input to coalesce. So this:

coalesce(!!!data[-1])

is actually equivalent to this:

coalesce(col1, col2, col3)

The advantage of writing it this way is that you don't have to know the column names nor how many columns there are to begin with.

Not sure why this answer gets downvoted. Would be nice if the downvoter can comment on things I can improve on. — acylam, Jan 31 '18 at 21:51

score 0 · Answer 2 · answered Jan 31 '18 at 20:31

0

Using dplyr::coalesce, or any of the answers at How to implement coalesce in R?:

xx$col4 = with(xx, coalesce(col1, col2, col3))

answered Jan 31 '18 at 20:31

Gregor Thomas

136,190
20
167
294

4

Why not hammer as duplicate? – zx8754 Jan 31 '18 at 20:35
Thank Gregor. This is working – Dinesh Ghanta Feb 02 '18 at 18:20

missing values filling in R

2 Answers2