Pandas: What is the difference between isin() and str.contains()?

Question

I want to know if a specific string is present in some columns of my dataframe (a different string for each column). From what I understand isin() is written for dataframes but can work for Series as well, while str.contains() works better for Series.

I don't understand how I should choose between the two. (I searched for similar questions but didn't find any explanation on how to choose between the two.)

score 22 · Accepted Answer · edited Mar 13 '22 at 21:12

.isin checks if each value in the column is contained in a list of arbitrary values. Roughly equivalent to value in [value1, value2].

str.contains checks if arbitrary values are contained in each value in the column. Roughly equivalent to substring in large_string.

In other words, .isin works column-wise and is available for all data types. str.contains works element-wise and makes sense only when dealing with strings (or values that can be represented as strings).

From the official documentation:

Series.isin(values)

Check whether values are contained in Series. Return a boolean Series showing whether each element in the Series matches an element in the passed sequence of values exactly.

Series.str.contains(pat, case=True, flags=0, na=nan,** **regex=True)

Test if pattern or regex is contained within a string of a Series or Index.

Return boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.

Examples:

print(df)
#     a
# 0  aa
# 1  ba
# 2  ca

print(df[df['a'].isin(['aa', 'ca'])])
#     a
# 0  aa
# 2  ca

print(df[df['a'].str.contains('b')])
#     a
# 1  ba

It's worth noting `pd.Series.isin` exhibits [material differences](https://stackoverflow.com/questions/50779617/pandas-pd-series-isin-performance-with-set-versus-array) versus `value in container`, and may be optimized for numeric data. While `pd.Series.str.contains` is always just a Python-level loop. — jpp, Oct 31 '18 at 10:00
it's also worth noting that .Contains, when applied to a db model, will translate into "a is in list b and a is not null". That 2nd half cost me hours of debugging a linq query. — John Lord, May 01 '20 at 00:17

Pandas: What is the difference between isin() and str.contains()?

1 Answers1

Linked