How to 'join' data frames if number is found in set?

Question

So for a project I'm working on at the moment, I have a data frame of numbers (call it A). The data.frame A consists of 1 variable and 300000 rows of distinct numbers.

Besides from A, I have data.frame B and C. Data frame B and C both have 315 variables and share the same structure (have the same datatypes BUT different values/ variable names).

A, B and C all share the same first variable, which is the primary key of them all. A contains the entire list of numbers and B and C both contain a subset of this list.

What I want to do, is match the numbers in the first variable in B and C to the ones in A, if they match then all the data of the belonging row in B and C should get put into A.

In SQL I know that you can update a table and add rows based on the results where it matches. However, I'm not allowed to use any SQL commands in this assignment (so I can't use the sqldf library) and I have no clue how to do this in R.

As an example, lets say you have the following data frames with info:

A.

B.

B_KEY  VAR_B_2   VAR_B_3
2     AB        134
4     AC        135
7     AD        136

C.

C_KEY  VAR_C_2   VAR_C_3
2     BD        250
3     BE        251
5     BF        252
7     BG        253

This should result into:

A.

KEY  VAR_B_2   VAR_B_3  VAR_C_2   VAR_C_3
1       
2      AB       134       BD        250
3                         BE        251   
4      AC       135
5                         BF        252
6
7      AD       136       BG        253

Bummer about not being able to use `sqldf`. – Tim Biegeleisen Oct 02 '19 at 09:08 — Tim Biegeleisen, Oct 02 '19 at 09:08

score 0 · Accepted Answer · answered Oct 02 '19 at 09:15

We can use nested merge

merge(merge(A, B, by.x = "KEY", by.y = "B_KEY", all.x = TRUE), C,  
    by.x = "KEY", by.y = "C_KEY", all.x = TRUE)

However, if we keep the column names for "KEY" column same in all the dataframes it becomes a little simpler and shorter with Reduce

names(B)[1] <- "KEY"
names(C)[1] <- "KEY"
Reduce(function(x, y) merge(x, y, all.x = TRUE), list(A, B, C))

# KEY VAR_B_2 VAR_B_3 VAR_C_2 VAR_C_3
#1   1    <NA>      NA    <NA>      NA
#2   2      AB     134      BD     250
#3   3    <NA>      NA      BE     251
#4   4      AC     135    <NA>      NA
#5   5    <NA>      NA      BF     252
#6   6    <NA>      NA    <NA>      NA
#7   7      AD     136      BG     253

How to 'join' data frames if number is found in set?

1 Answers1