How do I find an exact word match in a Series from a list of words? (Python 3.11)

Question

I am working on an NLP project for work, and I'm struggling with this current part. I have a dataframe that contains requirements that are strings ('This system shall...'). We want to check every requirement against a list of words, subset the requirements that contain one or more of those words, and then add a column that contains just those words that were found in each requirement.

Requirement	Contained_Words
'This system shall...'	'will','actions'

The current problem I'm having is that its matching the pattern of the word, not the exact word, so the output is incorrect.

def bad_words(doc: pd.DataFrame):
    words = 'will|must|actions'
    results = doc['Requirement'].str.contains(words).any()
    if results:
        df = doc[doc['Requirement'].str.contains(words)]
        print(df)
    else:
        print(f"No requirement contain the word(s): {words}.")

Does this help? https://stackoverflow.com/questions/3271478/check-list-of-words-in-another-string i.e. split words into a list, then check if any word in that list is in your string — ee-4-me, Aug 17 '23 at 15:14
Its close, I'm hoping to avoid loops as much as possible. I could loop over every requirement and use list comprehension to check if any of the words are in each requirement. But I don't think I'd be able to pull out exactly which words from the list are in each requirement that way. — Jacob L., Aug 17 '23 at 16:25

How do I find an exact word match in a Series from a list of words? (Python 3.11)

0 Answers0