Checking if elements in an array exist in a pandas DataFrame

Question

I have a pandas Dataframe and a pandas Series that looks like below.

df0 = pd.DataFrame({'col1':['a','b','c','d'],'col2':['b','c','e','f'],'col3':['d','f','g','a']})

  col1 col2 col3
0    a    b    d
1    b    c    f
2    c    e    g
3    d    f    a

df1 = pd.Series(['b','g','g'], index=['col1','col2','col3'])

col1    b
col2    g
col3    g
dtype: object

As you can see, the columns of df0 and the indices of df1 are the same. For each index of df1, I want to know if the value at that index exists in the corresponding column of df0. So, df1.col1 is b and we need to look for b only in df0.col1 and check if it exists.

Desired output:

array([True, False, True])

Is there a way to do this without using a loop? Maybe a method native to numpy or pandas?

Could use something like `melt` to convert both data frames from wide to long, and then `join` them on col and value? — TMBailey, Sep 18 '21 at 06:32

score 5 · Answer 1 · edited Sep 18 '21 at 08:11

5

Pandas' pandas.DataFrame.eq method is probably the simplest.

df0.eq(df1).any()

col1     True
col2    False
col3     True
dtype: bool

edited Sep 18 '21 at 08:11

mozway

194,879
13
39
75

answered Sep 18 '21 at 07:39

StevenS

662
2
7

Good! +1 Could add `to_numpy()` to convert the series to a numpy array as in the desired output. – SeaBean Sep 18 '21 at 08:26

tdy · Answer 2 · 2021-09-18T07:11:43.960

Using numpy

You can broadcast df1 to check against df0:

np.any(df1[None, :] == df0, axis=0)
# col1     True
# col2    False
# col3     True
# dtype: bool

Note that this assumes df1.index and df0.columns have the same order. If not, reindex first:

np.any(df1.reindex(df0.columns)[None, :] == df0, axis=0)

Using pandas

Use apply to check whether a given df1 value isin the corresponding col of df0:

df0.apply(lambda col: col.isin([df1[col.name]])).any()
# col1     True
# col2    False
# col3     True
# dtype: bool

Mykola Zotko · Accepted Answer · 2021-09-18T10:27:13.670

0

You can make use of broadcasting:

(df0 == df1).any().values

It also works with NumPy ndarrays:

assert (df0.columns == df1.columns).all()

(df0.values == df1.values).any(axis=0)

Output:

array([ True, False,  True])

edited Sep 18 '21 at 10:27

answered Sep 18 '21 at 08:54

Mykola Zotko

15,583
3
71
73

1

This is virtually the same as [@StevenS' answer](https://stackoverflow.com/a/69232614/16343464). Or should I propose `df0.__eq__(df1).any()`? :p – mozway Sep 18 '21 at 08:58
1

I personally think this doesn't have an added value here, if all the syntax variants were proposed for each answer this would be unbearable (I am just saying this as your answered much later than the other answer). Eventually you could add a comment in (or edit) the other answer – mozway Sep 18 '21 at 09:07
`eq` and `==` are different functions and you can get different results with parameters in `eq`. Plus I explain that you use broadcasting here. – Mykola Zotko Sep 18 '21 at 09:13
Could you explain how ‘eq’ and ‘==‘ are different? Also, for the numpy case, why does it work without ‘[None,:]’? Sorry, these are really basic questions. – Sep 18 '21 at 21:44
You can change the `axis` parameter in `eq` and get different results `df0.eq(df1, axis='rows')`. It works without `[None,:]` because of broadcasting (it does it automatically). – Mykola Zotko Sep 19 '21 at 17:00

score -1 · Answer 4 · answered Sep 18 '21 at 07:03

import pandas as pd
array=[]
df0 = pd.DataFrame({'col1':['a','b','c','d'],'col2':['b','c','e','f'],'col3':['d','f','g','a']})
df1 = pd.Series(['b','g','g'], index=['col1','col2','col3'])
for i in range(1,4):
    col = 'col'+str(i)
    array.append(df0[col].str.contains(df1[col]).any())
print(array)

score -2 · Answer 5 · answered Sep 18 '21 at 06:30

-2

If you'd like a quick one liner using list comprehension:

[df1[i] in df0[i].unique() for i in df1.index]

And if it needs to be an array:

np.array([df1[i] in df0[i].unique() for i in df1.index])

The output is:

array([ True, False, True])

answered Sep 18 '21 at 06:30

Machetes0602

366
2
8

Checking if elements in an array exist in a pandas DataFrame

5 Answers5

Using numpy

Using pandas