0

I am working on an NLP project for work, and I'm struggling with this current part. I have a dataframe that contains requirements that are strings ('This system shall...'). We want to check every requirement against a list of words, subset the requirements that contain one or more of those words, and then add a column that contains just those words that were found in each requirement.

Requirement Contained_Words
'This system shall...' 'will','actions'

The current problem I'm having is that its matching the pattern of the word, not the exact word, so the output is incorrect.

def bad_words(doc: pd.DataFrame):
    words = 'will|must|actions'
    results = doc['Requirement'].str.contains(words).any()
    if results:
        df = doc[doc['Requirement'].str.contains(words)]
        print(df)
    else:
        print(f"No requirement contain the word(s): {words}.")
Jacob L.
  • 57
  • 5
  • Does this help? https://stackoverflow.com/questions/3271478/check-list-of-words-in-another-string i.e. split words into a list, then check if any word in that list is in your string – ee-4-me Aug 17 '23 at 15:14
  • Its close, I'm hoping to avoid loops as much as possible. I could loop over every requirement and use list comprehension to check if any of the words are in each requirement. But I don't think I'd be able to pull out exactly which words from the list are in each requirement that way. – Jacob L. Aug 17 '23 at 16:25

0 Answers0