0

I have two columns with factors, I wanted to merge. As I have a lot of observations I wonder if there's a quick option with dplyr or tidyr.

Col1    Col2
 A        NA
 B        NA
 NA       C
 A        A
 NA       B
 A        NA
 B        B

I know that this shouldn't be difficult but I'm clearly missing something here. I've tried several options but as I want to keep the factors, all the ones I know didn't work.

Note that when both columns have a result, they will always be the same. But this is part of the data characteristics I have. I expect to have something such as:

Col1    Col2     Col3
 A        NA      A
 B        NA      B
 NA       C       C
 A        A       A
 NA       B       B
 A        NA      A
 B        B       B
FilipeTeixeira
  • 1,100
  • 2
  • 9
  • 29
  • What if none of the columns are NA ? – Ronak Shah Mar 20 '17 at 13:19
  • If they are not NA, they will be exactly the same. But I'll edit that on the question. – FilipeTeixeira Mar 20 '17 at 13:19
  • 2
    I don't think dplyr or tidyr is suited well for such a task. I would go with the good ol' base R `df[cbind(1:nrow(df), max.col(!is.na(df)))]` – David Arenburg Mar 20 '17 at 13:20
  • 1
    Or using `ifelse`: `df$c <- ifelse(is.na(df$col1), as.character(df$col2), as.character(df$col1))`. You could wrap it in `factor` if that is necessary. – lmo Mar 20 '17 at 13:23
  • @DavidArenburg and Imo, unfortunately both convert my factor into characters and I'm searching for an option which keeps the original levels. – FilipeTeixeira Mar 20 '17 at 13:29
  • 1
    @FilipeTeixeira If you want to keep them as `factors` just wrap it with `factor` as mentioned by lmo. So for David's answer try, `factor(df[cbind(1:nrow(df), max.col(!is.na(df)))])`. – Ronak Shah Mar 20 '17 at 13:35
  • I would go with `pmax(as.character(df$Col1), as.character(df$Col2), na.rm = TRUE)` – talat Mar 20 '17 at 13:47

1 Answers1

2

I think this should do it using dplyr:

library('dplyr')
dat %>% 
 mutate(Col3 = if_else(is.na(Col1),Col2, Col1))
Lucy
  • 981
  • 7
  • 15
  • It works but somehow it converts my factors into numeric. No idea why. – FilipeTeixeira Mar 20 '17 at 13:35
  • 1
    @FilipeTeixeira `ifelse()` tends to do that. `dat %>% mutate(Col3 = as.factor(ifelse(is.na(Col1),as.character(Col2), as.character(Col1))))` should fix this. – wjchulme Mar 20 '17 at 13:37
  • 2
    Answers to this question http://stackoverflow.com/questions/6668963/how-to-prevent-ifelse-from-turning-date-objects-into-numeric-objects explain why `ifelse()` does that and provide a solution. Hint: use `dplyr::if_else` – wjchulme Mar 20 '17 at 13:42
  • 1
    Nice! I updated to use `if_else` (thank you, @wjchulme!) I default to `stringsAsFactors=FALSE` (or using a `tibble`) so I didn't notice it was doing that. – Lucy Mar 20 '17 at 13:44
  • @wjchulme actually this is amazingly useful :D. Love it. Thanks – FilipeTeixeira Mar 20 '17 at 13:51