Merging in R without duplicating columns with same names between datasets

Question

I have 2 dataframes

x <- data.frame(ID=c("A","B","C","D"),y=c(1,2,3,4),animal=c("squid","shark", NA, NA))

which looks like

  ID y animal
1  A 1  squid
2  B 2  shark
3  C 3   <NA>
4  D 4   <NA>

and

y <- data.frame(ID=c("A","B","C","D"),y=c(1,2,3,4),animal=c(NA,NA, "dog", "cat"))

which looks like

  ID y animal
1  A 1   <NA>
2  B 2   <NA>
3  C 3    dog
4  D 4    cat

I want to merge these datasets by their ID so that the animal columns complete each other. However, when I try to do this using merge or the joins offered by dplyr I get these results

#example code 
merge(x, y, by = "ID")

#output 
  ID y.x animal.x y.y animal.y
1  A   1    squid   1     <NA>
2  B   2    shark   2     <NA>
3  C   3     <NA>   3      dog
4  D   4     <NA>   4      cat

What I'd like is for the output to just be

  ID y animal
1  A 1  squid
2  B 2  shark
3  C 3   dog
4  D 4   cat

There are ways to go about combining that with something like a loop that creates a final column where the incomplete data is tossed and only complete data is kept, but I'm curious if there are handy functions out there that can fix this with a one-liner

It seems like people have asked similar questions to this, but most of the posts I found were more interested in just renaming the shared column names between the 2 datasets to avoid the .x and .y results merge produces

Does this answer your question? [Can I replace NAs when joining two data frames with dplyr?](https://stackoverflow.com/questions/39110179/can-i-replace-nas-when-joining-two-data-frames-with-dplyr) — caldwellst, Feb 20 '20 at 05:02
This seems pretty similar, but it seems the first dplyr answer doesn't work when I run it and returns the error message `Error in select(., -var2.x, -var2.y) : unused arguments (-var2.x, -var2.y)` — Jin, Feb 20 '20 at 05:11
As a note, if you want to use `merge()`, you can do something like this. `foo <- merge(x, y, by = intersect(names(x), names(y)), all = TRUE); subset(foo, complete.cases(animal))` — jazzurro, Feb 20 '20 at 05:16
@762 The first `dplyr` answer works if you use column names which is present in your data i.e change from `var2.x` and `var2.y` to `animal.x` and `animal.y`. — Ronak Shah, Feb 20 '20 at 05:22
@jazzurro I think the semicolon doesn't work there because I still get the NA rows in animals (which is odd), but when it's on a separate line it somehow works. Do you notice the same issue? — Jin, Feb 20 '20 at 05:24
@RonakShah I actually ran the exact same code on the same dataframe objects in those answers, but it returned that error message I mentioned above. I just copy pasted everything. The datatable example worked, but the dplyr one did not — Jin, Feb 20 '20 at 05:24
If I copy-paste it as it is, it works fine for me without any error. You have package `MASS` loaded in your environment probably. Try with `dplyr::select` instead. — Ronak Shah, Feb 20 '20 at 05:28
@jazzurro yes, that what I did, but I was confused why the semicolon doesnt work like normally. Was curious to see if there's something wrong in my environment — Jin, Feb 20 '20 at 05:43

score 0 · Answer 1 · answered Feb 20 '20 at 05:07

0

library(dplyr)
x %>%
   left_join(y,by=c("ID","y")) %>%
   mutate(animal = coalesce(animal.x,animal.y))
  ID y animal.x animal.y animal
1  A 1    squid     <NA>  squid
2  B 2    shark     <NA>  shark
3  C 3     <NA>      dog    dog
4  D 4     <NA>      cat    cat


y <- data.frame(ID=c("A","B","C","D"),y=c(1,2,3,4),animal=c(NA,NA, "dog", "cat"),stringsAsFactors = F)

y <- data.frame(ID=c("A","B","C","D"),y=c(1,2,3,4),animal=c(NA,NA, "dog", "cat"),stringsAsFactors = F)

answered Feb 20 '20 at 05:07

Onyambu

67,392
3
24
53

Hmm I somehow got a very different output where it just creates a 3rd animal column that does not complete the data – Jin Feb 20 '20 at 05:12
is your data strings or factors? – Onyambu Feb 20 '20 at 05:15
Ah I see. I fixed to string and see the output is updated – Jin Feb 20 '20 at 05:18

Merging in R without duplicating columns with same names between datasets

1 Answers1