Thought this would be straight forward but had some trouble tracking down an elegant way to search all columns in a dataframe at same time for a partial string match. Basically how would I apply df['col1'].str.contains('^')
to an entire dataframe at once and filter down to any rows that have records containing the match?

- 8,809
- 14
- 48
- 77
-
4You want to search an entire dataframe rather than just a specific column? – EdChum Oct 29 '14 at 20:45
-
1the `str.contains` method is only valid for Series so you'd have to do something like `for col in df: df[col].str.contains('^')` – EdChum Oct 29 '14 at 20:48
8 Answers
The Series.str.contains
method expects a regex pattern (by default), not a literal string. Therefore str.contains("^")
matches the beginning of any string. Since every string has a beginning, everything matches. Instead use str.contains("\^")
to match the literal ^
character.
To check every column, you could use for col in df
to iterate through the column names, and then call str.contains
on each column:
mask = np.column_stack([df[col].str.contains(r"\^", na=False) for col in df])
df.loc[mask.any(axis=1)]
Alternatively, you could pass regex=False
to str.contains
to make the test use the Python in
operator; but (in general) using regex is faster.

- 842,883
- 184
- 1,785
- 1,677
-
3Hey @unutbu, question for you. Why do you use `np.column_stack' when you could use `pd.DataFrame(...).transpose()`? – propjk007 Dec 01 '15 at 17:42
-
4When `mask` is a boolean NumPy array, `df.loc[mask]` selected rows where the `mask` is True. If `mask` is a DataFrame, however, then `df.loc[mask]` selects rows from `df` whose *index* value matches the index value in `mask` which corresponds to a True value. This alignment of indices is wonderful when you need it, but slows down performance when you don't. So in short, if you don't need the index, use a NumPy array instead of a DataFrame. Also, creating the DataFrame is significantly slower than creating the NumPy array so there is no advantage to using `pd.DataFrame([...]).T` here. – unutbu Dec 01 '15 at 18:52
-
1I did not think of the performance affect of the DataFrame approach. It was, more or less, about adding another Module (numpy) and thought that using the functions in the same library (pandas) would be better. I see how your methodology in the long run is better. Thanks @unutbu! – propjk007 Dec 01 '15 at 22:02
-
4@unutbu what do you think about `mask = df.apply(lambda x: x.str.contains(r'\^', na=False))` instead of `np.column_stack`? – Zero Sep 18 '17 at 12:06
-
3@Zero: That works fine too. On the plus side, it is a bit shorter to write. On the minus side, it returns a DataFrame instead of a NumPy array. Since we are using `mask` for indexing, only the array values matter, not any ancillary labels. To make sure that Pandas does not do any unneeded index alignment, I tend to prefer using boolean NumPy arrays over Series for boolean indexing (though really, Pandas does the right thing, so it does not matter). In the end, I think which you use boils down to personal taste. – unutbu Sep 18 '17 at 18:29
-
3If your df has columns of varying dtypes you need to cast `df[col].astype('str')` for it to work. – Owlright Nov 09 '18 at 11:41
-
@Owlright: Or, you could restrict the iteration to just the string columns: `mask = np.column_stack([... for col in [list_of_string_columns]])`. – unutbu Nov 09 '18 at 14:58
-
@unutbu: Newbie question - the above solution returns the Dataframe, what is the proper way to extract the string from the found cell ? – Lyman Zerga Apr 12 '22 at 18:17
-
My output: `AttributeError: Can only use .str accessor with string values!` – zabop Sep 12 '22 at 07:23
-
@zabop Try the mask step as `mask = np.column_stack([df[col].astype('str').str.contains("s", na=False) for col in df])`. And, see Owlright's comments above about dtypes and the follow-ups in regards to that for where I got that suggestion. Note what you see when you try `df.dtypes`. I was seeing that error before I case everything to a string. Unrelated TIP: if want to search allowing matches to multiple strings, change `r"\^"` to `pattern` and on previous line put something like `pattern = '|'.join(['dog', 'canine'])`. – Wayne Sep 15 '22 at 18:18
Try with :
df.apply(lambda row: row.astype(str).str.contains('TEST').any(), axis=1)

- 6,733
- 9
- 31
- 60

- 1,041
- 1
- 10
- 23
-
5To make your search case-independent: df.apply(lambda row: row.astype(str).str.contains('TEST'.lower(), case=False).any(), axis=1) – Brad123 Oct 06 '20 at 01:43
-
2Very slow compared with the best answer: https://stackoverflow.com/a/26641085/2268280 – restrepo Oct 21 '21 at 03:11
Here's a function to solve the problem of doing text search in all column of a dataframe df
:
def search(regex: str, df, case=False):
"""Search all the text columns of `df`, return rows with any matches."""
textlikes = df.select_dtypes(include=[object, "string"])
return df[
textlikes.apply(
lambda column: column.str.contains(regex, regex=True, case=case, na=False)
).any(axis=1)
]
It differs from the existing answers by both staying in the pandas API and embracing that pandas is more efficient in column processing than row processing. Also, this is packed as a pure function :-)
Relevant docs:

- 1,788
- 18
- 19
posting my findings in case someone would need.
i had a Dataframe (360 000 rows), needed to search across the whole dataframe to find the rows (just a few) that contained word 'TOTAL' (any variation eg 'TOTAL PRICE', 'TOTAL STEMS' etc) and delete those rows.
i finally processed the dataframe in two-steps:
FIND COLUMNS THAT CONTAIN THE WORD:
for i in df.columns:
df[i].astype('str').apply(lambda x: print(df[i].name) if x.startswith('TOTAL') else 'pass')
DELETE THE ROWS:
df[df['LENGTH/ CMS'].str.contains('TOTAL') != True]

- 97
- 1
- 6
Here is an example using applymap. I found other answers didn't work for me since they assumed that all data in a column would be strings causing Attribute Errors. Also it is surprisingly fast.
def search(dataFrame, item):
mask = (dataFrame.applymap(lambda x: isinstance(x, str) and item in x)).any(1)
return dataFrame[mask]
You can easily change the lambda to use regex if needed.

- 11
- 1
Yet another solution. This selects for columns of type object
, which is Panda's type for strings. Other solutions that coerce to str with .astype(str)
could give false positives if you're searching for a number (and want to exclude numeric columns and only search in strings -- but if you want to include searching numeric columns it may be the better approach).
As an added benefit, filtering the columns in this way seems to have a performance benefit; on my dataframe of shape (15807, 35)
, with only 17 of those 35 being strings, I see 4.74 s ± 108 ms per loop
as compared to 5.72 s ± 155 ms
.
df[
df.select_dtypes(object)
.apply(lambda row: row.str.contains("with"), axis=1)
.any(axis=1)
]

- 2,737
- 3
- 29
- 45
Building on top of @unutbu's answer https://stackoverflow.com/a/26641085/2839786
I use something like this:
>>> import pandas as pd
>>> import numpy as np
>>>
>>> def search(df: pd.DataFrame, substring: str, case: bool = False) -> pd.DataFrame:
... mask = np.column_stack([df[col].astype(str).str.contains(substring.lower(), case=case, na=False) for col in df])
... return df.loc[mask.any(axis=1)]
>>>
>>> # test
>>> df = pd.DataFrame({'col1':['hello', 'world', 'Sun'], 'col2': ['today', 'sunny', 'foo'], 'col3': ['WORLD', 'NEWS', 'bar']})
>>> df
col1 col2 col3
0 hello today WORLD
1 world sunny NEWS
2 Sun foo bar
>>>
>>> search(df, 'sun')
col1 col2 col3
1 world sunny NEWS
2 Sun foo bar

- 19,057
- 5
- 77
- 60