0

So I have these two datasets:

    ID      DOB         ID2   count
1   4083    2007-10-01  3625    5
2   4408    2008-07-01  3603    2
3   4514    2007-07-01  3077    3
4   4396    2008-05-01  3413    5
5   4222    2003-12-01  3341    1
6   4291    2000-07-01  3201    5
7   4581    2005-07-01  3836    1
8   4487    2007-01-01  3264    5
9   4916    2009-10-01  3825    1
10  4277    2000-04-01  3381    2

ID       DOB       score1   score2  score3  score4  score5  score6
4291    2000-07-01  2       5       2       2       1       2
4323    2000-07-01  3       3       1       4       2       5
4408    2008-07-01  4       2       5       5       3       5
4222    2003-12-01  2       1       3       2       3       3
4581    2005-07-01  5       1       5       2       3       1
4005    2003-06-01  1       4       2       4       5       3
4718    2009-02-01  2       3       1       5       5       5
4396    2008-05-01  3       5       2       2       2       5
4924    2008-02-01  5       5       4       5       5       4
4083    2007-10-01  4       5       1       3       3       4
4099    2000-05-01  4       3       1       2       1       2
4277    2000-04-01  2       2       1       3       1       1
4487    2007-01-01  2       5       2       4       3       5
4514    2007-07-01  1       3       4       3       1       5
4003    2005-07-01  3       3       4       1       1       3
4366    2008-12-01  4       4       4       4       3       4
4790    2009-07-01  1       3       1       3       1       4
4643    2002-03-01  3       2       3       3       4       3
4475    2009-05-01  1       4       3       3       3       3
4916    2009-10-01  5       1       3       1       2       2

Within dataset2 there are the ID and Dobs from dataset1, along with other rows of IDs of subjects Im not interested in. What I would like to do is to extract the IDs present in both datasets and create a dataset with the "ID2" column from dataset 1 and the other columns from dataset 2. Like so:

ID       DOB         ID2    score1  score2  score3  score4  score5  score6
4394    2004-11-01  3625    2       2       4       2       2       3
4181    2002-04-01  3603    3       1       3       2       2       5
4942    2001-08-01  3077    3       3       5       3       1       5
4765    2003-05-01  3413    1       5       5       2       3       2
4517    2003-03-01  3341    1       2       1       4       1       5
4905    2002-12-01  3201    5       2       4       1       1       5
4636    2004-07-01  3836    3       1       1       4       4       4
4179    2004-08-01  3264    5       2       5       5       4       2
4448    2007-11-01  3825    2       3       5       4       2       4
4218    2006-04-01  3381    1       5       3       4       5       3 

I think the merge function comes into play here but for the life of me I cant seem to get it to work so any help you can give me will be gratefully received.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
googleplex101
  • 195
  • 2
  • 13
  • In the expected output, there is `4394` ID, but in the initial datasets, it is not there. – akrun Jun 05 '15 at 13:27
  • @googleplex, I don't see ID=4394 in your first data.frame (nor in your second one) so it's kind of hard to understand the desired output... – Cath Jun 05 '15 at 13:27
  • Whoops! My apologies, not sure how that happened. going to fix it now – googleplex101 Jun 05 '15 at 13:29
  • 1
    Looking at the help pages of `merge` (or any other function) is a good way to understand how it works as there are examples in it. Try `?merge` – akrun Jun 05 '15 at 13:33

1 Answers1

2

does this answer your need ?

 merge(df1, df2, by.x=c("ID","DOB"))
Cath
  • 23,906
  • 5
  • 52
  • 86