You can try searching entire dataframe using the below code:
df[df.eq("Apple").any(1)]
# if using pandas version >=1.5, passing positional argument was deprecated
df[df.eq("Apple").any(axis=1)]
Using numpy
comparison
df[(df.values.ravel() == "Apple").reshape(df.shape).any(1)]
Timing (pandas version 1.5.2):
While the .ravel() approach is initially quicker on smaller datasets, using .eq() is faster on larger datasets.
small_df = pd.DataFrame({"A":list(range(500)), "B":list(range(500, 1000))})
large_df = pd.DataFrame({"A":list(range(100000)), "B":list(range(100000, 200000))})
largest_df = pd.DataFrame({"A":list(range(1000000)), "B":list(range(1000000, 2000000))})
def filter_df_by_value_eq(df, value):
return df[df.eq(value).any(axis=1)]
def filter_df_by_value_ravel(df, value):
return df[(df.values.ravel() == value).reshape(df.shape).any(1)]
In [8]: %timeit filter_df_by_value_eq(small_df, 612)
175 µs ± 1.01 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [9]: %timeit filter_df_by_value_ravel(small_df, 612)
78.9 µs ± 215 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [10]: %timeit filter_df_by_value_eq(large_df, 1502964)
307 µs ± 2.21 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [11]: %timeit filter_df_by_value_ravel(large_df, 1502964)
1.56 ms ± 13.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [12]: %timeit filter_df_by_value_eq(largest_df, 10502964)
3.04 ms ± 66.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [13]: %timeit filter_df_by_value_ravel(largest_df, 10502964)
15.2 ms ± 43.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)