Pandas: Check if row exists with certain values

Question

I have a two dimensional (or more) pandas DataFrame like this:

>>> import pandas as pd
>>> df = pd.DataFrame([[0,1],[2,3],[4,5]], columns=['A', 'B'])
>>> df
   A  B
0  0  1
1  2  3
2  4  5

Now suppose I have a numpy array like np.array([2,3]) and want to check if there is any row in df that matches with the contents of my array. Here the answer should obviously true but eg. np.array([1,2]) should return false as there is no row with both 1 in column A and 2 in column B.

Sure this is easy but don't see it right now.

there is both a row and a column with 1 , 2 so what exactly do you mean that there isn't? — John Ruddell, Jul 15 '14 at 14:44
Have you tried something like df.isin([1,2])?? It will output the booleans you need to find the rows that match fully and partially — ZJS, Jul 15 '14 at 14:47
@JohnRuddell: But there is no row which contains a 1 in column A and a 2 in column B. I am looking for semantics like the WHERE clause in SQL. — Robin, Jul 15 '14 at 14:52

score 104 · Accepted Answer · answered Jul 15 '14 at 14:51

104

Turns out it is really easy, the following does the job here:

>>> ((df['A'] == 2) & (df['B'] == 3)).any()
True
>>> ((df['A'] == 1) & (df['B'] == 2)).any()
False

Maybe somebody comes up with a better solution which allows directly passing in the array and the list of columns to match.

Note that the parenthesis around df['A'] == 2 are not optional since the & operator binds just as strong as the == operator.

answered Jul 15 '14 at 14:51

Robin

8,197
11
45
74

9

The solution is good, however the performance is really bad when your data is huge. Try this: df.loc[(df['A'] == 2) & (df['B'] == 3)]. I tested on a dataset my solution's performance is 918 ms, and the one is 2.12 s. – Navy Oct 28 '18 at 08:46
3

@Navy Your solution can't be evaluated to True/False, you need to do something with the resulting DF – fgblomqvist Sep 19 '19 at 15:43
@Navy I'd love to see that as an answer if there is a way to evaluate to a Boolean value – Xgongiveittoya Feb 27 '20 at 18:32
2

@fgblomqvist @Xgongiveittoya Add an `any()` and an `all()` call to receive a Boolean. Like this: `df.loc[(df['A'] == 2) & (df['B'] == 3)].any().all()` – Jan Willem Jun 08 '21 at 10:19

score 35 · Answer 2 · answered Jul 15 '14 at 15:04

35

an easier way is:

a = np.array([2,3])
(df == a).all(1).any()

answered Jul 15 '14 at 15:04

acushner

9,595
1
34
34

This only works if the dimensionality is the same though, right? I.e. if the matching should be based on the first two columns of a five column DataFrame, wouldn't this fail? – Robin Jul 15 '14 at 16:21
5

that's trivial. replace `df` with `df[df.columns[:2]]` – acushner Jul 15 '14 at 16:29
Just in case you try it on a different data structure, try `(df == a).all(0).any()` as well – questionto42 Dec 06 '21 at 12:41

score 14 · Answer 3 · answered Sep 15 '16 at 21:50

14

If you also want to return the index where the matches occurred:

index_list = df[(df['A'] == 2)&(df['B'] == 3)].index.tolist()

answered Sep 15 '16 at 21:50

sparrow

10,794
12
54
74

1

exactly what I was looking for thankss – cestpasmoi Mar 31 '22 at 12:42

score 1 · Answer 4 · edited May 21 '22 at 12:16

1

To find rows where a single column equals a certain value:

df[df['column name'] == value]

To find rows where multiple columns equal different values, Note the inner ():

df[(df["Col1"] == Value1 & df["Col2"] == Value2 & ....)]

edited May 21 '22 at 12:16

moeabdol

4,779
6
44
43

answered May 20 '22 at 12:15

Pantzaris

49
8

Although this is a correct answer, adding context to your code will make it clearer for others to understand! Also note that OP might be considering multiple columns and values. – moeabdol May 21 '22 at 12:10

score 1 · Answer 5 · answered Sep 26 '22 at 13:35

a simple solution with dictionary

def check_existance(dict_of_values, df):
    v = df.iloc[:, 0] == df.iloc[:, 0]
    for key, value in dict_of_values.items():
        v &= (df[key] == value)
    return v.any()

import pandas as pd
df = pd.DataFrame([[0,1],[2,3],[4,5]], columns=['A', 'B'])

this_row_exists = {'A':2, 'B':3}
check_existance(this_row_exists, df)
# True

this_row_does_not_exist = {'A':2, 'B':5}
check_existance(this_row_does_not_exist, df)
# False

score 0 · Answer 6 · answered May 20 '22 at 12:08

An answer that works with larger dataframes so you don't need to manually check for each columns:

import pandas as pd
import numpy as np

#define variables
df = pd.DataFrame([[0,1],[2,3],[4,5]], columns=['A', 'B'])
a = np.array([2,3])

def check_if_np_array_is_in_df(df, a):
    # transform a into a dataframe
    da = pd.DataFrame(np.expand_dims(a,axis=0), columns=['A','B'])

    # drop duplicates from df
    ddf=df.drop_duplicates()

    result = pd.concat([ddf,da]).shape[0] - pd.concat([ddf,da]).drop_duplicates().shape[0]
    return result

print(check_if_np_array_is_in_df(df, a))
print(check_if_np_array_is_in_df(df, [1,3]))

Kapil_Khatik · Answer 7 · 2023-01-16T11:09:00.347

0

If you want to return the row where the matches occurred:

resulting_row = df[(df['A'] == 2)&(df['B'] == 3)].values

edited Jan 16 '23 at 11:09

answered Jan 16 '23 at 11:07

Kapil_Khatik

1
1
5

Pandas: Check if row exists with certain values

7 Answers7

Linked