3

I have to datasets, old and new. Both should have three columns in common: mmsi,d,ts.

old
    mmsi d  lat     lon     ts
0   0   93  40.3601 122.179 1458213600
1   0   93  40.3601 122.179 1458214200
2   0   93  40.3601 122.179 1458214800
3   0   93  40.3601 122.179 1458217200
4   0   93  40.3601 122.179 1458220800

new

    mmsi d   ts          lat            lon
0   0   77  1458211800  46.3675533333   48.01107
1   0   77  1458212400  46.3675533333   48.01107
2   0   77  1458213000  46.3675533333   48.01107
3   0   77  1458215400  46.3675533333   48.01107
4   0   77  1458219000  46.3675533333   48.01107

The two sets should have mmsi,d,ts in common, but when I perform a merge, the shape of the resulting data frame is much different than the shape of old or new.

Do I use Merge of Join in this instance? How can I combine both dataframes so that I can compare latand lon?

Demetri Pananos
  • 6,770
  • 9
  • 42
  • 73
  • Sorry are you asking how to tell if some values are only present in left/right/both? if so you can do `old.merge(new, on=['mms'i,'d','ts'], how='outer', indicator=True)` this will add `_merge` column telling where the data is present in left/right/both – EdChum Mar 17 '16 at 18:00
  • Im interested in the mmsi, d, ts couples that are in both data sets. So will that line of code do the trick? – Demetri Pananos Mar 17 '16 at 18:06
  • No you can remove `how` param ` or pass `how='inner'` to find only where values exist in both dfs – EdChum Mar 17 '16 at 18:10

1 Answers1

3

You can try merge with parameter how='inner':

print pd.merge(new,old,on=['mmsi','d','ts'], how='inner')

Sample:

print new
   mmsi   d          ts        lat       lon
0     0  77  1458213600  46.367553  48.01107
1     0  77  1458214200  46.367553  48.01107
2     0  77  1458213000  46.367553  48.01107
3     0  77  1458215400  46.367553  48.01107
4     0  77  1458217200  46.367553  48.01107

print old
   mmsi   d       lat      lon          ts
0     0  77  48.01107  122.179  1458213600
1     0  77  48.01107  122.179  1458214200
2     0  77  48.01107  122.179  1458214800
3     0  77  48.01107  122.179  1458217200
4     0  77  48.01107  122.179  1458220800

#added custom suffixes
print pd.merge(new,old,on=['mmsi','d','ts'], how='inner', suffixes=('_new','_old'))
   mmsi   d          ts    lat_new   lon_new   lat_old  lon_old
0     0  77  1458213600  46.367553  48.01107  48.01107  122.179
1     0  77  1458214200  46.367553  48.01107  48.01107  122.179
2     0  77  1458217200  46.367553  48.01107  48.01107  122.179
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252