18

I have a dataframe and I want to search all columns for values that is text 'Apple'. I know how to do it with one column, but how can I apply this to ALL columns? I want to make it a function, so that next time I can directly use it to search for other values in other dateframes.

Thanks.

Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
Chloe
  • 207
  • 1
  • 3
  • 5

5 Answers5

22

You can try searching entire dataframe using the below code:

df[df.eq("Apple").any(1)]
# if using pandas version >=1.5, passing positional argument was deprecated
df[df.eq("Apple").any(axis=1)]

Using numpy comparison

df[(df.values.ravel() == "Apple").reshape(df.shape).any(1)]

Timing (pandas version 1.5.2): While the .ravel() approach is initially quicker on smaller datasets, using .eq() is faster on larger datasets.

small_df = pd.DataFrame({"A":list(range(500)), "B":list(range(500, 1000))})
large_df = pd.DataFrame({"A":list(range(100000)), "B":list(range(100000, 200000))})
largest_df = pd.DataFrame({"A":list(range(1000000)), "B":list(range(1000000, 2000000))})

def filter_df_by_value_eq(df, value):
    return df[df.eq(value).any(axis=1)]

def filter_df_by_value_ravel(df, value):
    return df[(df.values.ravel() == value).reshape(df.shape).any(1)]

In [8]: %timeit filter_df_by_value_eq(small_df, 612)
175 µs ± 1.01 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [9]: %timeit filter_df_by_value_ravel(small_df, 612)
78.9 µs ± 215 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [10]: %timeit filter_df_by_value_eq(large_df, 1502964)
307 µs ± 2.21 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [11]: %timeit filter_df_by_value_ravel(large_df, 1502964)
1.56 ms ± 13.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [12]: %timeit filter_df_by_value_eq(largest_df, 10502964)
3.04 ms ± 66.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [13]: %timeit filter_df_by_value_ravel(largest_df, 10502964)
15.2 ms ± 43.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
jacob
  • 55
  • 7
Vijay Anand Pandian
  • 1,027
  • 11
  • 23
12

Something like this:

In [1188]: df
Out[1188]: 
   id   name      n1
0   1   Zeke     may
1   2  Apple    maya
2   3      a   Apple
3   4   Maya       a
4   5  Derek  Mayank
5   6     an      is
6   7    the     the

Just have a check like:

In [1190]: df[df == 'Apple']
Out[1190]: 
   id   name     n1
0 NaN    NaN    NaN
1 NaN  Apple    NaN
2 NaN    NaN  Apple
3 NaN    NaN    NaN
4 NaN    NaN    NaN
5 NaN    NaN    NaN
6 NaN    NaN    NaN

OR

In [1191]: df.where(df == 'Apple')
Out[1191]: 
   id   name     n1
0 NaN    NaN    NaN
1 NaN  Apple    NaN
2 NaN    NaN  Apple
3 NaN    NaN    NaN
4 NaN    NaN    NaN
5 NaN    NaN    NaN
6 NaN    NaN    NaN

This lets you search through all the columns of a dataframe.

Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
12

import pandas library

import pandas as pd

Raw Data or URL of file

raw_data = {'first_name': ['Mihir', 'Mihir', 'Raju', 'Johan', 'Johan'],
               'last_name': ['Patel', 'Patel', 'Ali', 'Khan', 'Khan'], 
               'age': [42, 42, 36, 24, 53]}
df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age'])

Find The Value

df.loc[df['first_name']=='Mihir']
Pratik Patel
  • 353
  • 2
  • 6
  • 11
    This finds the value from a single column. The OP specifically asked for a solution that searched the entire df. – ViennaMike Dec 12 '21 at 03:01
0

Search list of values in given dataframe all columns:

filter_data = df[df.isin(['Apple', 'Green Apple']).any(1)]

Search single value in given dataframe all columns:

filter_data = df[df.contains('Apple').any(1)]
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
atiwari
  • 19
  • 2
-9

Take what you have for your single method and include in the for loop.

for column in df:
    #your code here