R rbind different sized tables without duplicates

Question

I've been struggling to make this work all day and I can't figure it out. I have two tables x and y that I'd like to combine. The first two columns of y are a subset of x, but y has an extra column. I'd like to combine the two without having duplicates in the first two columns.

x:

y:

A    B    C
-------------
1    2    123
3    4    456
9    10   789

I'd like my result to be:

A    B    C
-------------
1    2    123
3    4    456
5    6    0
7    8    0
9    10   789

How would I go about doing this?

Can you share your tables in valid R syntax? `dput()` makes this easy, just post `dput(x)` and `dput(y)` in your question. It makes it much easier to demonstrate an answer. — Gregor Thomas, May 07 '18 at 14:32
Also, please save the RStudio tag for questions where RStudio is part of the problem - for example if you have R code that runs fine in the R command line but doesn't work in RStudio. — Gregor Thomas, May 07 '18 at 14:33
As for a solution, looks like `merge(x, y, all = T)`. You can easily replace the `NA`s with `0`s afterwards. — Gregor Thomas, May 07 '18 at 14:34
Questions similar to this have definitely come up here before (e.g. [here](https://stackoverflow.com/questions/3402371/combine-two-data-frames-by-rows-rbind-when-they-have-different-sets-of-columns)). Not sure if this is an exact duplicate though. — Tim Biegeleisen, May 07 '18 at 14:36
For some more information about what I've tried.
`z <- rbind(x, y[, c("A", "B")])`
`z <- z[!(duplicated(z)|duplicated(z, fromLast = TRUE)),] ` to remove duplicates
`rbind(z, y, fill=TRUE)` — Henk, May 07 '18 at 14:39
This is called **merging** the data. Not rbinding... so you need to use **merge()** as described by @Gregor above. — Onyambu, May 07 '18 at 14:42
@Gregor Wow works like a charm, I can't believe how much time I've put in this. — Henk, May 07 '18 at 14:42

score 1 · Accepted Answer · answered May 07 '18 at 14:46

What you're describing is a JOIN operation. I'd use dplyr::full_join():

x <- data.frame('A' = c(1,3,5,7,9), 'B' = c(2,4,6,8,10))
y <- data.frame('A' = c(1,3,9), 'B' = c(2,4,10), 'C' = c(123, 456, 789))
dplyr::full_join(x,y)
# Joining, by = c("A", "B")
  A  B   C
1 1  2 123
2 3  4 456
3 5  6  NA
4 7  8  NA
5 9 10 789

full_join will keep all columns from both tables, matching on all identical variables (or whatever variables you choose using the by = argument). When there are not matching values, it returns NA, but you can replace NA with 0 if you'd prefer.

R rbind different sized tables without duplicates

1 Answers1