Pandas - find specific value in entire dataframe

Question

I have a dataframe and I want to search all columns for values that is text 'Apple'. I know how to do it with one column, but how can I apply this to ALL columns? I want to make it a function, so that next time I can directly use it to search for other values in other dateframes.

Thanks.

Possible duplicate of https://stackoverflow.com/questions/26640129/search-for-string-in-all-pandas-dataframe-columns-and-filter — Quazi Marufur Rahman, Dec 03 '18 at 04:19
`(df=='Apple').sum()` This will give the number of rows in all columns where the value = 'Apple — prashanth, Jan 29 '22 at 10:35

score 22 · Answer 1 · edited Mar 30 '23 at 20:14

You can try searching entire dataframe using the below code:

df[df.eq("Apple").any(1)]
# if using pandas version >=1.5, passing positional argument was deprecated
df[df.eq("Apple").any(axis=1)]

Using numpy comparison

df[(df.values.ravel() == "Apple").reshape(df.shape).any(1)]

Timing (pandas version 1.5.2): While the .ravel() approach is initially quicker on smaller datasets, using .eq() is faster on larger datasets.

small_df = pd.DataFrame({"A":list(range(500)), "B":list(range(500, 1000))})
large_df = pd.DataFrame({"A":list(range(100000)), "B":list(range(100000, 200000))})
largest_df = pd.DataFrame({"A":list(range(1000000)), "B":list(range(1000000, 2000000))})

def filter_df_by_value_eq(df, value):
    return df[df.eq(value).any(axis=1)]

def filter_df_by_value_ravel(df, value):
    return df[(df.values.ravel() == value).reshape(df.shape).any(1)]

In [8]: %timeit filter_df_by_value_eq(small_df, 612)
175 µs ± 1.01 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [9]: %timeit filter_df_by_value_ravel(small_df, 612)
78.9 µs ± 215 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [10]: %timeit filter_df_by_value_eq(large_df, 1502964)
307 µs ± 2.21 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [11]: %timeit filter_df_by_value_ravel(large_df, 1502964)
1.56 ms ± 13.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [12]: %timeit filter_df_by_value_eq(largest_df, 10502964)
3.04 ms ± 66.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [13]: %timeit filter_df_by_value_ravel(largest_df, 10502964)
15.2 ms ± 43.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

score 12 · Answer 2 · answered Dec 03 '18 at 04:48

Something like this:

In [1188]: df
Out[1188]: 
   id   name      n1
0   1   Zeke     may
1   2  Apple    maya
2   3      a   Apple
3   4   Maya       a
4   5  Derek  Mayank
5   6     an      is
6   7    the     the

Just have a check like:

In [1190]: df[df == 'Apple']
Out[1190]: 
   id   name     n1
0 NaN    NaN    NaN
1 NaN  Apple    NaN
2 NaN    NaN  Apple
3 NaN    NaN    NaN
4 NaN    NaN    NaN
5 NaN    NaN    NaN
6 NaN    NaN    NaN

OR

In [1191]: df.where(df == 'Apple')
Out[1191]: 
   id   name     n1
0 NaN    NaN    NaN
1 NaN  Apple    NaN
2 NaN    NaN  Apple
3 NaN    NaN    NaN
4 NaN    NaN    NaN
5 NaN    NaN    NaN
6 NaN    NaN    NaN

This lets you search through all the columns of a dataframe.

score 12 · Answer 3 · answered Dec 03 '18 at 04:50

12

import pandas library

import pandas as pd

Raw Data or URL of file

raw_data = {'first_name': ['Mihir', 'Mihir', 'Raju', 'Johan', 'Johan'],
               'last_name': ['Patel', 'Patel', 'Ali', 'Khan', 'Khan'], 
               'age': [42, 42, 36, 24, 53]}
df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age'])

Find The Value

df.loc[df['first_name']=='Mihir']

answered Dec 03 '18 at 04:50

Pratik Patel

353
2
6

11

This finds the value from a single column. The OP specifically asked for a solution that searched the entire df. – ViennaMike Dec 12 '21 at 03:01

score 0 · Answer 4 · edited Apr 29 '22 at 09:18

0

Search list of values in given dataframe all columns:

filter_data = df[df.isin(['Apple', 'Green Apple']).any(1)]

Search single value in given dataframe all columns:

filter_data = df[df.contains('Apple').any(1)]

edited Apr 29 '22 at 09:18

marc_s

732,580
175
1,330
1,459

answered Apr 28 '22 at 07:59

atiwari

19
2

Are you refering to https://pandas.pydata.org/docs/reference/api/pandas.Series.str.contains.html ? Because dataframe does not have any "contains" method – Tito Rezende May 11 '22 at 01:19
What does `any(n)` do? – not2qubit Jul 28 '22 at 21:58
no .contains method. Tito was correct. – pauljohn32 Jan 14 '23 at 16:27
The `isin()` is quite quick. But the `.contains()` only works on a single column and needs to be prefixed by the column name and `.str` e.g. `df.name.str.contains()` – Pierz Mar 01 '23 at 14:42

score -9 · Answer 5 · answered Dec 03 '18 at 04:09

-9

Take what you have for your single method and include in the for loop.

for column in df:
    #your code here

answered Dec 03 '18 at 04:09

James Marmarou

1
2

Pandas - find specific value in entire dataframe

5 Answers5

import pandas library

Raw Data or URL of file

Find The Value

Linked

Related