Add a new column to a dataframe using matching values of another dataframe

Question

I am trying to fill in table1 with matching val2 values of table2

table1$New_val2 = table2[table2$pid==table1$pid,]$val2

But I get the warning

longer object length is not a multiple of shorter object length

which is fair enough because the table lengths are not the same.

Please kindly direct me on the correct way to do this.

`merge(table1, table2, by="pid")` optionally add in the `all.x=TRUE` argument if desired. — cory, May 04 '16 at 17:25
hi cory, what if there are other columns in table2 but I only wish to add col2? — andy, May 04 '16 at 17:28

score 49 · Accepted Answer · edited Nov 16 '20 at 12:12

49

merge(table1, table2[, c("pid", "val2")], by="pid")

Add in the all.x=TRUE argument in order to keep all of the pids in table1 that don't have matches in table2...

You were on the right track. Here's a way using match...

table1$val2 <- table2$val2[match(table1$pid, table2$pid)]

edited Nov 16 '20 at 12:12

JTFouquier

answered May 04 '16 at 17:39

cory

If the column names aren't the same but they are actually content-wise the same, would I just state their names in by.x and by.y ? Example: pid in table1 is called just that, but in table2 it has another name, e.g. pidx – Lukas Süsslin Apr 04 '23 at 22:53
1

Yup, by.x and by.y are for the case when they index names are different between the two tables. – cory Apr 05 '23 at 13:38

score 8 · Answer 2 · answered May 04 '16 at 17:28

8

I am not sure if you mean this but you might use:

newtable <- merge(table1,table2, by  = "pid")

This will create a new table called newtable, with 3 columns and those values matched by the id, in this case "pid".

answered May 04 '16 at 17:28

adrian1121

Alexander Kielland · Answer 3 · 2020-01-16T08:30:22.993

4

I'm way late here, but in case anybody else asks the same question:
This is exactly what dplyr's inner_merge does.

table1.df <- dplyr::inner_join(table1, table2, by=pid)

The by-command specifies which column should be used to match the rows.

EDIT: I used to have so much difficulty remembering it's a [join], and not a [merge].

edited Jan 16 '20 at 08:30

answered Oct 23 '17 at 06:59

I prefer this to `merge()` as the table is not shuffled in the process, although the function is now called `dplyr::inner_join()` – Yollanda Beetroot Jul 24 '19 at 08:07
2

pid also now needs to be in "" - i.e. table1.df <- dplyr::inner_join(table1, table2, by = "pid") – André.B May 13 '20 at 04:27

3 Answers3