I have 2 dataframes
x <- data.frame(ID=c("A","B","C","D"),y=c(1,2,3,4),animal=c("squid","shark", NA, NA))
which looks like
ID y animal
1 A 1 squid
2 B 2 shark
3 C 3 <NA>
4 D 4 <NA>
and
y <- data.frame(ID=c("A","B","C","D"),y=c(1,2,3,4),animal=c(NA,NA, "dog", "cat"))
which looks like
ID y animal
1 A 1 <NA>
2 B 2 <NA>
3 C 3 dog
4 D 4 cat
I want to merge these datasets by their ID so that the animal
columns complete each other. However, when I try to do this using merge
or the joins offered by dplyr
I get these results
#example code
merge(x, y, by = "ID")
#output
ID y.x animal.x y.y animal.y
1 A 1 squid 1 <NA>
2 B 2 shark 2 <NA>
3 C 3 <NA> 3 dog
4 D 4 <NA> 4 cat
What I'd like is for the output to just be
ID y animal
1 A 1 squid
2 B 2 shark
3 C 3 dog
4 D 4 cat
There are ways to go about combining that with something like a loop that creates a final column where the incomplete data is tossed and only complete data is kept, but I'm curious if there are handy functions out there that can fix this with a one-liner
It seems like people have asked similar questions to this, but most of the posts I found were more interested in just renaming the shared column names between the 2 datasets to avoid the .x and .y results merge
produces