Compare rows of a dataset with another dataset in R

Question

I have dataset1 with 1400 row and 25 columns, and dataset2 with 400 rows and 5 columns.Both datasets have a column called ID. as a small example, I can illustrate them like below:

dataset1:

ID  c1  c2  c3    c4
12   m   n   5   1/2/2015
5    c   x   4   2/3/2015
45   g   t   47  4/23/2015
45   j   t   3   1/1/2016
61   t   y   12  7/3/2015
3    r   n   18  3/3/2015

dataset2:

 ID  a1    a2
 45   1    1/1/2015
 3    5    2/2/2016
 12   12   4/29/2016

(as you can see ID in dataset2 is a subset of ID in dataset1)

what I want is: for each row of dataset1, if the value in column ID is equal to a value in the column ID of dataset2, then copy the corresponding value of the column a2 of that row of dataseset2 into a new column of dataset1 as below:

  ID  c1  c2  c3     c4           c5
  12   m   n   5   1/2/2015    4/29/2016
  5    c   x   4   2/3/2015       NA
  45   g   t   47  4/23/2015   1/1/2015
  45   j   t   3   1/1/2016    1/1/2015
  61   t   y   12  7/3/2015       NA
  3    r   n   18  3/3/2015    2/2/2016

I appreciate your help.

Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610). This will make it much easier for others to help you. — Jaap, Apr 08 '17 at 21:00
Probably need `match` or `which`, but I often don't answer unless there's a [MCVE] — IRTFM, Apr 08 '17 at 22:49

score 1 · Answer 1 · answered Apr 09 '17 at 15:08

As @42 mentioned, you can use match.

This is an example with match:

# match the ID of df1 with that of df2
# then returns the index of df2 that
# matches df1
# then subset the a2 column using the above index
# then store in a new column in df1
df1$c5 <- df2$a2[match(df1$ID, df2$ID)]

The output of the above code is below:

> df1
  ID c1 c2 c3         c4    c5
1 12  m  n  5 01/02/2015  4/29/2016
2  5  c  x  4 01/02/2015       <NA>
3 45  g  t 47 01/02/2015 01/01/2015
4 45  j  t  3 01/02/2015 01/01/2015
5 61  t  y 12 01/02/2015       <NA>
6  3  r  n 18 01/02/2015 02/02/2016

www · Answer 2 · 2017-04-09T20:34:30.747

din's answer is perfect. The other way to think about is to merge to data frames.

Data Preparation

ex_data1 <- data.frame(ID = c(12, 5, 45, 45, 61, 3),
                       c1 = c("m", "c", "g", "j", "t", "r"),
                       c2 = c("n", "x", "t", "t", "y", "n"),
                       c3 = c(5, 4, 47, 3, 12, 8), 
                       c4 = c("1/2/2015", "2/3/2015", "4/23/2015",
                              "1/1/2016", "7/3/2015", "3/3/2015"),
                       stringsAsFactors = FALSE)

ex_data2 <- data.frame(ID = c(45, 3, 12),
                       a1 = c(1, 5, 12),
                       a2 = c("1/1/2015", "2/2/2016", "4/29/2016"), stringsAsFactors = FALSE)

Solution 1: Merge the data using base R

ex_data3 <- ex_data2[, c("ID", "a2")]
names(ex_data3) <- c("ID", "c5")

m_data <- merge(ex_data1, ex_data3, by = "ID", all = TRUE)

Solution 2: Merge the data using dplyr

library(dplyr)

m_data <- ex_data1 %>%
  left_join(ex_data2, by = "ID") %>%
  select(-a1, c5 = a2)

Compare rows of a dataset with another dataset in R

2 Answers2