0

I have df1:

           A           B   C
0   15:00:00  2002-01-13   8
1   15:00:00  2002-01-13   9
2   15:30:00  2002-01-13   8
3   15:30:00  2002-01-13   9
4   16:00:00  2002-01-13   8
5   16:00:00  2002-01-13   9
6   15:00:00  2002-01-14  17
7   15:00:00  2002-01-14  19
8   15:30:00  2002-01-14  17
9   15:30:00  2002-01-14  19
10  15:00:00  2002-01-15  38
11  15:00:00  2002-01-15  40
12  15:30:00  2002-01-15  38
13  15:30:00  2002-01-15  40
14  16:00:00  2002-01-15  38
15  16:00:00  2002-01-15  40

And df2:

           A           B   C
 0  16:00:00  2002-01-13   9
 1  16:00:00  2002-01-15  38

I want to get a new df3 selecting the rows in df1 where:

  • df1["B"] == df2["B"] and df1["C"] == df2["C"]

df3 should be:

           A           B  C
1   15:00:00  2002-01-13  9  # df1["B"] == df2["B"] and df1["C"] == df2["C"]
3   15:30:00  2002-01-13  9
5   16:00:00  2002-01-13  9
10  15:00:00  2002-01-15  38
12  15:30:00  2002-01-15  38
14  16:00:00  2002-01-15  38
Tie_24
  • 625
  • 4
  • 14
  • have you looked at [this question](https://stackoverflow.com/questions/28901683/pandas-get-rows-which-are-not-in-other-dataframe)? and does it have your needs? obvisouly they are asking for `not in` so just remove the `~` from the solution. – MattR Mar 06 '18 at 17:42

3 Answers3

2

You can use pd.merge:

df1.merge(df2, on=['B','C'],suffixes=('','_y')).drop('A_y',axis=1)

          A           B   C
0  15:00:00  2002-01-13   9
1  15:30:00  2002-01-13   9
2  16:00:00  2002-01-13   9
3  15:00:00  2002-01-15  38
4  15:30:00  2002-01-15  38
5  16:00:00  2002-01-15  38

Another option is use boolean index:

df1[df1.B.isin(df2.B) & df1.C.isin(df2.C)]

           A           B   C
1   15:00:00  2002-01-13   9
3   15:30:00  2002-01-13   9
5   16:00:00  2002-01-13   9
10  15:00:00  2002-01-15  38
12  15:30:00  2002-01-15  38
14  16:00:00  2002-01-15  38
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
1

IIUC.I usually do this in R and seems like also work in python

df1.loc[df1[['B','C']].astype(str).sum(1).isin(df2[['B','C']].astype(str).sum(1)),:]
Out[75]: 
           A           B   C
1   15:00:00  2002-01-13   9
3   15:30:00  2002-01-13   9
5   16:00:00  2002-01-13   9
10  15:00:00  2002-01-15  38
12  15:30:00  2002-01-15  38
14  16:00:00  2002-01-15  38
BENY
  • 317,841
  • 20
  • 164
  • 234
1

I would do this as follows:

df3 = pd.merge(df1, df2, on=['B', 'C'])

Which gives the following:

        A_x           B   C       A_y
0  15:00:00  2002-01-13   9  16:00:00
1  15:30:00  2002-01-13   9  16:00:00
2  16:00:00  2002-01-13   9  16:00:00
3  15:00:00  2002-01-15  38  16:00:00
4  15:30:00  2002-01-15  38  16:00:00
5  16:00:00  2002-01-15  38  16:00:00

Some cleanup required:

df3.drop('A_y', axis=1, inplace=True)
df3.columns = ['A', 'B', 'C']

Result:

          A           B   C
0  15:00:00  2002-01-13   9
1  15:30:00  2002-01-13   9
2  16:00:00  2002-01-13   9
3  15:00:00  2002-01-15  38
4  15:30:00  2002-01-15  38
5  16:00:00  2002-01-15  38
sjw
  • 6,213
  • 2
  • 24
  • 39