I have a pandas data frame which has a column 'title' and I have a list called my_list. I want to search each value from df['title'] in my_list and get the list index. df['title'] needs to be a whole word and not part of any word.
Regex seemed to be the obvious solution. I am storing one value at a time from df['title'] to i to match, concatenating '\b' before and after the string to get the whole word only, I am getting no match even though the word exists.
for i in df['title']:
print(i)
x = ('\b'+i+'\b')
print(x)
print([s for s in my_list if re.search(x, s)])
print(len(i))
print(len(x))
The output that I get is
ITEM
ITE
[]
4
6
ITEM_NUMBER_TYPE
ITEM_NUMBER_TYP
[]
16
18
.........and so on
I am unable to figure out why ITEM becomes ITE and when I try
x = ('\b'+i+' \b')
#added a white space before the second '\b' to get the whole word
I get the following output:
ITEM
ITEM
[]
4
7
ITEM_NUMBER_TYPE
ITEM_NUMBER_TYPE
[]
16
19
......and so on
content of my_list:
print(my_list)
Output:
['ITEM_XFORM_IND IS NOT NULL',
'ITEM IS NOT NULL',
'ITEM_LEVEL IS NOT NULL']
#I want for i = 'ITEM', just 'ITEM IS NOT NULL' to match, not the rest.