0

I m trying to merge 2 datasets:

dataset 1
id, month, year, postal

dataset 2
id, month, year, postal, Income, name, division

dataset 1
id year month postal  
1 2010   9     j0r1h0
2 2010   8     j0r1h0
....
....
7   2007 6     j3x4p2

dataset 2
id,  year, month, postal, name, division
1   2010 9     j0r1h0 john starting
2   2010 8     j0r1h0 lili retired

I want to keep all my columns and rows in dataset 1 and get the extra columns from dataset 2, like Income and division.

I get wrong result, duplicate field in month and year when I tried:

merge(a,b,by=c(postal,month,year,all.x=TRUE)

This is my expected result:

id year month postal name division
1   2010 9     j0r1h0 john  starting
2   2010 8     j0r1h0 lili  retired
3   2010 7     j1v3c4 verna starting
4   2009 1     j23c5  Greg  medium
5   2007 1     j2j4d3 Greg  medium
6   2008 2     j2p4s3  na   na
7   2007 6     j3x4p2  na   starting

And this is my result:

id year month postal name division
1   2010 9     j0r1h0 john  starting
2   2010 8     j0r1h0 lili  retired
3   2010 8     j0r1h0  na   na
4   2010 7      na     na   na
5   2010 7     j1v3c4 verna starting
6   2009 1     j23c5  Greg  medium
7   2007 1     j2j4d3 Greg  medium
8   2008 2     j2p4s3  na   na
9   2007 6     j3x4p2  na   starting
9   2007 1     j3x4p2  na   starting

my real data set size is over 200000 x 16

memile
  • 9
  • 3
  • 2
    See [How to make a great R reproducible example?](https://stackoverflow.com/q/5963269/4996248) and try to format your code please. –  Nov 02 '19 at 20:54
  • Are you sure that your code is `merge(a,b,by=c(postal,month,year,all.x=TRUE)`? This should result in an error. I guess your code is `merge(a,b,by=c(postal,month,year),all.x=TRUE)`? Further, your dataframes are pretty small, so try to provide the data with `dput()` –  Nov 03 '19 at 08:44
  • yes you right, I typed it wrong, but still dont have the correct answer – memile Nov 03 '19 at 17:00
  • 1
    Hi memile, if you can put some of your data here with `dput`, and then enough code so that we can cut and paste it into our own R session and reproduce your same error, it will be much easier to find someone to provide the help. Thanks :) – mysteRious Nov 04 '19 at 03:42
  • impossible confidential info there – memile Nov 04 '19 at 17:29
  • @memile I would suggest to only show the data you have shown in the question anyway. Otherwise it is unclear why the problem arises (at least to me). Anyway, sometimes it is helpful to use `join()` from the ` plyr` package rather then `merge()`. But I am just guessing here. –  Nov 05 '19 at 07:36

0 Answers0