-1

I have a pandas data frame which has a column 'title' and I have a list called my_list. I want to search each value from df['title'] in my_list and get the list index. df['title'] needs to be a whole word and not part of any word.

Regex seemed to be the obvious solution. I am storing one value at a time from df['title'] to i to match, concatenating '\b' before and after the string to get the whole word only, I am getting no match even though the word exists.

for i in df['title']:
    print(i)
    x = ('\b'+i+'\b')
    print(x)
    print([s for s in my_list if re.search(x, s)])
    print(len(i))
    print(len(x))

The output that I get is

ITEM
ITE
[]
4
6
ITEM_NUMBER_TYPE
ITEM_NUMBER_TYP
[]
16
18
.........and so on

I am unable to figure out why ITEM becomes ITE and when I try

x = ('\b'+i+' \b')
#added a white space before the second '\b' to get the whole word

I get the following output:

ITEM
ITEM
[]
4
7
ITEM_NUMBER_TYPE
ITEM_NUMBER_TYPE
[]
16
19
......and so on

content of my_list:

print(my_list)
Output:
['ITEM_XFORM_IND IS NOT NULL',
 'ITEM IS NOT NULL',
 'ITEM_LEVEL IS NOT NULL']

#I want for i = 'ITEM', just 'ITEM IS NOT NULL' to match, not the rest.
Syed Afsahul
  • 11
  • 1
  • 1
  • 6

1 Answers1

-1

Python interprets \b as backspace, try r'\b'

RobC
  • 22,977
  • 20
  • 73
  • 80
Pete
  • 79
  • 7