Name Text
0 K IeatApple
1 Y bananaisdelicious
2 B orangelikesomething
3 Q blueBanana
4 C appleislike
I want to match the 'text' column and list in the data frame.
However, there is no distinction between lowercase and uppercase letters in the'text' column. So, to capture all of them, the list was changed to a regex as follows.
mylist = [apple, banana]
mylist = [f"(?i){re.escape(k)}" for k in mylist]
#contain matching list - column
extracted = df['text'].str.findall(f'({"|".join(mylist)})').apply(set)
#Matched words are added to the data frame as column.
df['matching'] = extracted.str.join(',')
#keyword counting
s = pd.DataFrame(extracted.tolist()).stack().value_counts()
print(s)
Apple 1
Banana 1
banana 1
apple 1
One problem with doing this is that it recognizes'apple' and 'Apple' differently.
Is there a way to match both the upper and lower case letters and spell the same word?