Select rows with different df conditions

Question

I have df1:

           A           B   C
0   15:00:00  2002-01-13   8
1   15:00:00  2002-01-13   9
2   15:30:00  2002-01-13   8
3   15:30:00  2002-01-13   9
4   16:00:00  2002-01-13   8
5   16:00:00  2002-01-13   9
6   15:00:00  2002-01-14  17
7   15:00:00  2002-01-14  19
8   15:30:00  2002-01-14  17
9   15:30:00  2002-01-14  19
10  15:00:00  2002-01-15  38
11  15:00:00  2002-01-15  40
12  15:30:00  2002-01-15  38
13  15:30:00  2002-01-15  40
14  16:00:00  2002-01-15  38
15  16:00:00  2002-01-15  40

And df2:

           A           B   C
 0  16:00:00  2002-01-13   9
 1  16:00:00  2002-01-15  38

I want to get a new df3 selecting the rows in df1 where:

df1["B"] == df2["B"] and df1["C"] == df2["C"]

df3 should be:

           A           B  C
1   15:00:00  2002-01-13  9  # df1["B"] == df2["B"] and df1["C"] == df2["C"]
3   15:30:00  2002-01-13  9
5   16:00:00  2002-01-13  9
10  15:00:00  2002-01-15  38
12  15:30:00  2002-01-15  38
14  16:00:00  2002-01-15  38

have you looked at [this question](https://stackoverflow.com/questions/28901683/pandas-get-rows-which-are-not-in-other-dataframe)? and does it have your needs? obvisouly they are asking for `not in` so just remove the `~` from the solution. — MattR, Mar 06 '18 at 17:42

Scott Boston · Accepted Answer · 2018-03-06T17:49:33.213

2

You can use pd.merge:

df1.merge(df2, on=['B','C'],suffixes=('','_y')).drop('A_y',axis=1)

          A           B   C
0  15:00:00  2002-01-13   9
1  15:30:00  2002-01-13   9
2  16:00:00  2002-01-13   9
3  15:00:00  2002-01-15  38
4  15:30:00  2002-01-15  38
5  16:00:00  2002-01-15  38

Another option is use boolean index:

df1[df1.B.isin(df2.B) & df1.C.isin(df2.C)]

           A           B   C
1   15:00:00  2002-01-13   9
3   15:30:00  2002-01-13   9
5   16:00:00  2002-01-13   9
10  15:00:00  2002-01-15  38
12  15:30:00  2002-01-15  38
14  16:00:00  2002-01-15  38

edited Mar 06 '18 at 17:49

answered Mar 06 '18 at 17:46

Scott Boston

147,308
15
139
187

1

Didn't know about the suffixes argument, nice - will save me some lines in future. – sjw Mar 06 '18 at 17:48
For example 2002-01-15 , 9 you will include here by using isin – BENY Mar 06 '18 at 17:53

score 1 · Answer 2 · answered Mar 06 '18 at 17:43

IIUC.I usually do this in R and seems like also work in python

df1.loc[df1[['B','C']].astype(str).sum(1).isin(df2[['B','C']].astype(str).sum(1)),:]
Out[75]: 
           A           B   C
1   15:00:00  2002-01-13   9
3   15:30:00  2002-01-13   9
5   16:00:00  2002-01-13   9
10  15:00:00  2002-01-15  38
12  15:30:00  2002-01-15  38
14  16:00:00  2002-01-15  38

score 1 · Answer 3 · answered Mar 06 '18 at 17:47

I would do this as follows:

df3 = pd.merge(df1, df2, on=['B', 'C'])

Which gives the following:

        A_x           B   C       A_y
0  15:00:00  2002-01-13   9  16:00:00
1  15:30:00  2002-01-13   9  16:00:00
2  16:00:00  2002-01-13   9  16:00:00
3  15:00:00  2002-01-15  38  16:00:00
4  15:30:00  2002-01-15  38  16:00:00
5  16:00:00  2002-01-15  38  16:00:00

Some cleanup required:

df3.drop('A_y', axis=1, inplace=True)
df3.columns = ['A', 'B', 'C']

Result:

          A           B   C
0  15:00:00  2002-01-13   9
1  15:30:00  2002-01-13   9
2  16:00:00  2002-01-13   9
3  15:00:00  2002-01-15  38
4  15:30:00  2002-01-15  38
5  16:00:00  2002-01-15  38

Select rows with different df conditions

3 Answers3