8

I have a pandas Dataframe and a pandas Series that looks like below.

df0 = pd.DataFrame({'col1':['a','b','c','d'],'col2':['b','c','e','f'],'col3':['d','f','g','a']})

  col1 col2 col3
0    a    b    d
1    b    c    f
2    c    e    g
3    d    f    a

df1 = pd.Series(['b','g','g'], index=['col1','col2','col3'])

col1    b
col2    g
col3    g
dtype: object

As you can see, the columns of df0 and the indices of df1 are the same. For each index of df1, I want to know if the value at that index exists in the corresponding column of df0. So, df1.col1 is b and we need to look for b only in df0.col1 and check if it exists.

Desired output:

array([True, False, True])

Is there a way to do this without using a loop? Maybe a method native to numpy or pandas?

Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73
  • Could use something like `melt` to convert both data frames from wide to long, and then `join` them on col and value? – TMBailey Sep 18 '21 at 06:32

5 Answers5

5

Pandas' pandas.DataFrame.eq method is probably the simplest.

df0.eq(df1).any()

col1     True
col2    False
col3     True
dtype: bool
mozway
  • 194,879
  • 13
  • 39
  • 75
StevenS
  • 662
  • 2
  • 7
  • Good! +1 Could add `to_numpy()` to convert the series to a numpy array as in the desired output. – SeaBean Sep 18 '21 at 08:26
1

Using numpy

You can broadcast df1 to check against df0:

np.any(df1[None, :] == df0, axis=0)
# col1     True
# col2    False
# col3     True
# dtype: bool

Note that this assumes df1.index and df0.columns have the same order. If not, reindex first:

np.any(df1.reindex(df0.columns)[None, :] == df0, axis=0)

Using pandas

Use apply to check whether a given df1 value isin the corresponding col of df0:

df0.apply(lambda col: col.isin([df1[col.name]])).any()
# col1     True
# col2    False
# col3     True
# dtype: bool
tdy
  • 36,675
  • 19
  • 86
  • 83
0

You can make use of broadcasting:

(df0 == df1).any().values

It also works with NumPy ndarrays:

assert (df0.columns == df1.columns).all()

(df0.values == df1.values).any(axis=0)

Output:

array([ True, False,  True])
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73
  • 1
    This is virtually the same as [@StevenS' answer](https://stackoverflow.com/a/69232614/16343464). Or should I propose `df0.__eq__(df1).any()`? :p – mozway Sep 18 '21 at 08:58
  • 1
    I personally think this doesn't have an added value here, if all the syntax variants were proposed for each answer this would be unbearable (I am just saying this as your answered much later than the other answer). Eventually you could add a comment in (or edit) the other answer – mozway Sep 18 '21 at 09:07
  • `eq` and `==` are different functions and you can get different results with parameters in `eq`. Plus I explain that you use broadcasting here. – Mykola Zotko Sep 18 '21 at 09:13
  • Could you explain how ‘eq’ and ‘==‘ are different? Also, for the numpy case, why does it work without ‘[None,:]’? Sorry, these are really basic questions. –  Sep 18 '21 at 21:44
  • You can change the `axis` parameter in `eq` and get different results `df0.eq(df1, axis='rows')`. It works without `[None,:]` because of broadcasting (it does it automatically). – Mykola Zotko Sep 19 '21 at 17:00
-1
import pandas as pd
array=[]
df0 = pd.DataFrame({'col1':['a','b','c','d'],'col2':['b','c','e','f'],'col3':['d','f','g','a']})
df1 = pd.Series(['b','g','g'], index=['col1','col2','col3'])
for i in range(1,4):
    col = 'col'+str(i)
    array.append(df0[col].str.contains(df1[col]).any())
print(array)
-2

If you'd like a quick one liner using list comprehension:

[df1[i] in df0[i].unique() for i in df1.index]

And if it needs to be an array:

np.array([df1[i] in df0[i].unique() for i in df1.index])

The output is:

array([ True, False, True])

Machetes0602
  • 366
  • 2
  • 8