0

I have two data table, the first looks like this (Table1):

enter image description here

The second one looks like this (Table2):

enter image description here

I'm looking for a way to combine those two tables and make it look like this (Table3):

enter image description here

I'll elaborate:

  • Table1 has more rows than Table2.

  • Values in the X column of Table1 may repeat several times ("a", "a", "d", "d", "h", "h", etc.), but each repeated value has a different value under the Y column ("a" - 1, "a" - 2, "d" - 5, "d" - 6, etc.).

  • Some values in the Y column of Table1 may be the same for different values under the "X" column (for example: "b" - 3 and "e" - 3).

  • The "Y" column of Table 1 and the "Y" column of Table2 are the keys for joining the tables (remember that Table1 has more rows than Table2 - meaning that all the values of Table2$Y are found in Table1$Y but not vice versa).

  • Eventually, Table3 should contain only the rows from columns "X" and "Y" from Table1 that have a unique value in column "Z" (forming a 3-column table).

I really hope my explanation is coherent, I find this joining operation very frustrating.

Thank you for your kind assistance. -A

Thomas Fritsch
  • 9,639
  • 33
  • 37
  • 49
Adrian Smith
  • 139
  • 4
  • 1
    Can you give the output of `dput()` for three tables? That way I can answer your question. – tyluRp Nov 24 '17 at 08:04

1 Answers1

0

Definitely easier to do this when you create a minimum reproducible example.

But what you're looking for is an inner_join

library(dplyr)

df3 <- inner_join(df1, df2, by = "Y")
Matt W.
  • 3,692
  • 2
  • 23
  • 46
  • Thank you for your comments. Using inner_join causes redundancy and duplications, is there another recommended way? – Adrian Smith Nov 24 '17 at 11:21
  • If it causes redundancy and duplication, your example isn't indicative of the problem you're trying to solve. This can be solved by providing some of your actual data so that we can work with it and try to solve your problem. The answer provided does give the output you expected in table3. The redundancy might be caused by duplication in your `table2`. But in order to figure that out I would actually need some of your data. – Matt W. Nov 24 '17 at 14:35