0

I've been struggling to make this work all day and I can't figure it out. I have two tables x and y that I'd like to combine. The first two columns of y are a subset of x, but y has an extra column. I'd like to combine the two without having duplicates in the first two columns.

x:

A    B
------
1    2
3    4
5    6
7    8
9    10

y:

A    B    C
-------------
1    2    123
3    4    456
9    10   789

I'd like my result to be:

A    B    C
-------------
1    2    123
3    4    456
5    6    0
7    8    0
9    10   789

How would I go about doing this?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Henk
  • 73
  • 8
  • 1
    Can you share your tables in valid R syntax? `dput()` makes this easy, just post `dput(x)` and `dput(y)` in your question. It makes it much easier to demonstrate an answer. – Gregor Thomas May 07 '18 at 14:32
  • 7
    Have you tried to `merge` the tables? – talat May 07 '18 at 14:33
  • Also, please save the RStudio tag for questions where RStudio is part of the problem - for example if you have R code that runs fine in the R command line but doesn't work in RStudio. – Gregor Thomas May 07 '18 at 14:33
  • 2
    As for a solution, looks like `merge(x, y, all = T)`. You can easily replace the `NA`s with `0`s afterwards. – Gregor Thomas May 07 '18 at 14:34
  • 1
    Questions similar to this have definitely come up here before (e.g. [here](https://stackoverflow.com/questions/3402371/combine-two-data-frames-by-rows-rbind-when-they-have-different-sets-of-columns)). Not sure if this is an exact duplicate though. – Tim Biegeleisen May 07 '18 at 14:36
  • `dplyr::left_join` also can be used to accomplish it – patL May 07 '18 at 14:36
  • For some more information about what I've tried.
    `z <- rbind(x, y[, c("A", "B")])`
    `z <- z[!(duplicated(z)|duplicated(z, fromLast = TRUE)),] ` to remove duplicates
    `rbind(z, y, fill=TRUE)`
    – Henk May 07 '18 at 14:39
  • This is called **merging** the data. Not rbinding... so you need to use **merge()** as described by @Gregor above. – Onyambu May 07 '18 at 14:42
  • @Gregor Wow works like a charm, I can't believe how much time I've put in this. – Henk May 07 '18 at 14:42

1 Answers1

1

What you're describing is a JOIN operation. I'd use dplyr::full_join():

x <- data.frame('A' = c(1,3,5,7,9), 'B' = c(2,4,6,8,10))
y <- data.frame('A' = c(1,3,9), 'B' = c(2,4,10), 'C' = c(123, 456, 789))
dplyr::full_join(x,y)
# Joining, by = c("A", "B")
  A  B   C
1 1  2 123
2 3  4 456
3 5  6  NA
4 7  8  NA
5 9 10 789

full_join will keep all columns from both tables, matching on all identical variables (or whatever variables you choose using the by = argument). When there are not matching values, it returns NA, but you can replace NA with 0 if you'd prefer.

divibisan
  • 11,659
  • 11
  • 40
  • 58