Check if a Series is already in a Dataframe

Question

Let´s say you have some students

students = [ ['Jack', 34, 'Sydeny'] ,
             ['Riti', 30, 'Delhi' ] ,
             ['Aadi', 16, 'New York'] ]
dfObj = pd.DataFrame(students, columns = ['Name', 'Age', 'City'])

And now you receive a series like this:

s = pd.Series(['Riti', 30, 'Delhi'], index=['Name', 'Age', 'City'])

I could now use .loc to filter for the criteria like this:

filtered_dfObj = dfObj.loc[(dfObj['Name'] == s['Name']) & (dfObj['Age'] == s['Age'])]
filtered_dfObj = filtered_dfObj.loc[filtered_dfObj['City'] == s['City']]

But if I have a lot of columns the filter code would grow very fast. So it would be the best if there would be an option like s.isin(dfObj)

Update after 5 answers: These are all good answers - Thanks! I did not do any speedtests between the different approches yet. I personally go with this solution, because it is most-flexible regarding column-selection (if it is needed).

Can maybe help you: [python-pandas-remove-duplicate-columns](https://stackoverflow.com/questions/14984119/python-pandas-remove-duplicate-columns) — Alexandre B., Jun 10 '19 at 13:25
Thanks for your quick reply @AlexandreB. Unfortunately this does not look promissing, because I want to check if the content of the series is existing into the `bObj` rows. There is no error with duplicated columns. — gies0r, Jun 10 '19 at 13:28

score 4 · Accepted Answer · 2019-06-10T14:11:26.667

4

Consider the following approach:

# number of full duplicates (rows)
print((dfObj == s).all(axis=1).sum())

If you wanna check only some columns then you may add filter by column names like:

flt = ['Name', 'Age']
# number of partial duplicates (rows)
print((dfObj[flt] == s[flt]).all(axis=1).sum())

edited Jun 10 '19 at 14:11

answered Jun 10 '19 at 13:41

score 1 · Answer 2 · answered Jun 10 '19 at 13:42

1

one approach would be convert Dataframe data into list and convert series data into list and do the compare .

import pandas as pd

students = [ ['Jack', 34, 'Sydeny'] ,
             ['Riti', 30, 'Delhi' ] ,
             ['Aadi', 16, 'New York'] ]
dfObj = pd.DataFrame(students, index = ['Name', 'Age', 'City'])
s = pd.Series(['Riti', 38, 'Delhi'], index=['Name', 'Age', 'City'])

if(s.values.tolist() in dfObj.values.tolist()):
    print("Series present in  Datframe ")
else:
    print("Series NOT present in  Datframe ")

answered Jun 10 '19 at 13:42

Sujit Dhamale

1,311
11
14

Thanks for your hint that the series data and df data is not matching - Was copy/paste error - Edited in question – gies0r Jun 10 '19 at 13:55
This approach works for small datasets - many thanks! However, if you want to avoid type conversion, I would tend towards one of the other alternatives. – gies0r Jun 10 '19 at 14:07

BENY · Answer 3 · 2019-06-10T13:57:13.887

1

Check with

dfObj.apply(tuple,1).isin([tuple(s.tolist())])

edited Jun 10 '19 at 13:57

answered Jun 10 '19 at 13:43

BENY

317,841
20
164
234

score 1 · Answer 4 · answered Jun 10 '19 at 13:54

1

Use apply/lambda and check if each column (axis=1) is equal to s.

dfObj[dfObj.apply(lambda x: x.equals(s), axis=1)]

Result:

    Name    Age City
1   Riti    39  Delhi

answered Jun 10 '19 at 13:54

jose_bacoy

12,227
1
20
38

GZ0 · Answer 5 · 2019-06-10T15:40:36.630

0

If you don't care about the index in the original dataframe, this would work

df.merge(s.to_frame().T, how="inner")

Otherwise, you can do

df[np.all(df.values == s.reindex(df.columns).values, axis=1)]

edited Jun 10 '19 at 15:40

answered Jun 10 '19 at 13:50

GZ0

4,055
1
10
21

Check if a Series is already in a Dataframe

5 Answers5