1

I want Pandas to search my data frame for the complete string and not a sub-string. Here is a minimal-working example to explain my problem -

data = [['tom', 'wells fargo', 'retired'], ['nick', 'bank of america', 'partner'], ['juli', 'chase', 'director - oil well']]
 
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Name', 'Place', 'Position'])
 
# print dataframe.
df
val = 'well'
df.loc[df.apply(lambda col: col.str.contains(val, case=False)).any(axis="columns")]

The correct code would have only returned the second row and not the first one

    Name    Place   Position
0   tom wells fargo retired
2   juli    chase   director - oil well

Update - My intention is to have a search that looks for the exact string requested. While looking for "well" the algorithm shouldn't extract out "well. Based on the comments, I understand how my question might be misleading.

desert_ranger
  • 1,096
  • 3
  • 13
  • 26

2 Answers2

1

IIUC, you can use:

>>> df[~df['Position'].str.contains(fr'\b{val}\b')]

   Name        Place             Position
0   tom  wells fargo              retired
2  juli        chase  director - oil well

And for all columns:

>>> df[~df.apply(lambda x: x.str.contains(fr'\b{val}\b', case=False)).any(axis=1)]

   Name        Place             Position
0   tom  wells fargo              retired
2  juli        chase  director - oil well
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • @desert_ranger. I updated my answer. Can you check the second solution please? The key is to use '\b' to match a whole word. – Corralien Jan 10 '23 at 20:29
1

The regular expression anchor \b which is a word boundary is what you want.

I added addtional data to your code to illustrate more:

import pandas as pd
data = [
          ['tom', 'wells fargo', 'retired']
        , ['nick', 'bank of america', 'partner']
        , ['john','bank of welly','blah']
        , ['jan','bank of somewell knwon','well that\'s it']
        , ['juli', 'chase', 'director - oil well']
        ]
 
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Name', 'Place', 'Position'])
 
# print dataframe.
df
val = 'well'
df.loc[df.apply(lambda col: col.str.contains(fr"\b{val}\b", case=False)).any(axis="columns")]

EDIT In Python3 the string can be substitute with the variable with f in front of " or ' and r will express it as regular expression. Then now you can get the val as you want. Thank @smci

and the output is like this

Name Place Position
3 jan bank of somewell knwon well that's it
4 juli chase director - oil well
  • You don't *need* to escape it with `\\b`. You can simply **escape it with r'\b`**, that's called a raw string regex. See [Regular Expression HOWTO: backslash escape](https://docs.python.org/3/howto/regex.html?highlight=raw%20string#the-backslash-plague) or https://stackoverflow.com/questions/2081640/what-exactly-do-u-and-r-string-prefixes-do-and-what-are-raw-string-literals – smci Jan 10 '23 at 20:56