-1

I have two dataframes and I want to filter one dataframe if two other values are not present in the other dataframe. Both dataframes share the name of the columns.

For example, dataframe A has:

col1,col2
1     5
-10   15
6     7

and dataframe B has:

col1,col2
6     7
-10   15
-1    5

So in this example, I would like to pick the value pair in A and see if it is present in B.

First row of A has value pair 1,5 and since 1,5 is not present in B that row would be excluded from A. Second and third row of A has value -10,15 and 6,7, and since both are present in B I would like to keep them.

So the desired output of the filtered table A would be:

col1,col2
-10   15
6     7

How can I achieve this?

EDIT: One of the first things I tried was a merge, but the resulting dataframe was actually bigger than the original. Since merge and merging 101 topic was suggested, I will add the real dataframes here.

Dataframe A have latitude, longitude and id columns (id is not the index). It has 363 rows:

    id       lat        lon
0   0   -33.252192  -70.765291
1   1   -33.224300  -70.780249
2   2   -33.251651  -70.797289
3   3   -33.298574  -70.770133
4   4   -33.214315  -70.787822
... ... ... ...
358 499 -33.227614  -70.770126
359 501 -33.299217  -70.770685
360 502 -33.191476  -70.801492
361 503 -33.239037  -70.780278
362 504 -33.263893  -70.762674

Dataframe B has 73096 rows and it also has and id, latitude and longitude. I'm putting here only lat and lon.

        lat        lon
1   -33.260415  -70.713767
2   -33.461718  -70.853525
3   -33.258741  -70.638032
4   -33.544858  -70.578624
8   -33.535512  -70.574188
... ... ...
97724   -33.451817  -70.847999
97725   -33.452225  -70.846520
97726   -33.450841  -70.841494
97729   -33.461407  -70.856090
97730   -33.457633  -70.822085

So I want to see if the lat,lon pair in A is present in B and if not then exclude it from A.

When I do A.merge(B) I get a dataframe that is 1108 rows long.

2 Answers2

0

You can try pandas.merge. Something like df1.merge(df2, how='inner', left_on=['col1','col2'], right_on=['col1','col2']).

(To help you remember, the naming of these arguments comes from an inner join in database terminology)

niklas
  • 81
  • 1
  • 4
0

A simple merge will do

df_out = dfA.merge(dfB)

Output

   col1  col2
0   -10    15
1     6     7

df.merge does an inner join by default.

RichieV
  • 5,103
  • 2
  • 11
  • 24