2

I've a data frame with missing values is some column (who doesn't). For example:

df <- data.frame(x = c(2,NA,4), y = 5:7)
df
   x y
1  2 5
2 NA 6
3  4 7

I would like to replace the missing value with a value of a different column. Obviously there are a lot of ways to do so, for example:

 df %>%
   mutate(x = ifelse(is.na(x), y, x))

  x y
1 2 5
2 6 6
3 4 7

However, I am looking for something more elegant, like

df %>% fill(x,y) 

but couldn't find anything. Does something like this exist?

Thanks!

Adiel Loinger
  • 199
  • 11
  • 1
    Instead of ifelse you could use dplyr's `coalesce` function, i.e. `df %>% mutate(x = coalesce(x, as.numeric(y)))` – talat Nov 09 '17 at 11:51
  • 1
    If there was a pure `tidyr` solution, I bet it would have appeared here: [How to implement coalesce efficiently in R](https://stackoverflow.com/questions/19253820/how-to-implement-coalesce-efficiently-in-r) – Henrik Nov 09 '17 at 12:17
  • Thank you for your suggestions. – Adiel Loinger Nov 10 '17 at 05:19

3 Answers3

6

You want to change values in a single column, keeping the same number of rows. The tidyverse way to do that is dplyr::mutate, and the tidyverse implementation of the specific operation you want is dplyr::coalesce, as docendo discimus suggested:

df %>% mutate(x = coalesce(x, y))

Things would be less tidy and less consistent if there was a single function that combined these two steps, as it is not the whole data frame being operated on, just a single column. It would also be less flexible, as coalesce can be used on vectors whether or not they are in a data frame, which is good!


(I actually dislike tidyr::fill - I suppose it is consistent because it operates on all columns of the data frame, but I would prefer that it took a single vector and was typically used inside mutate. mutate_all(fill) would be easy enough to do the whole data frame. So I end up still relying zoo::na.locf for general use.)

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
3

I am aware I don't fully answer the question, but I find the standard data frame way not so bad :

df$x[is.na(df$x)] <- df$y[is.na(df$x)]

and the data.table way quite simple and elegant:

df[is.na(x),x := y]
denis
  • 5,580
  • 1
  • 13
  • 40
0

try this, good luck

df <- t(apply(df, 1, function(x) if(any(is.na(x))) rep(x[!is.na(x)], 2) else x))
as.data.frame(df)
myincas
  • 1,500
  • 10
  • 15