How to check if a string contains substring when both are stored in lists in python?

Question

My main string is in dataframe and substrings are stored in lists. My desired output is to find the matched substring. Here is the code I am using.

sentence2 = "Previous study: 03/03/2018 (other hospital)  Findings:   Lung parenchyma: The study reveals evidence of apicoposterior segmentectomy of LUL showing soft tissue thickening adjacent surgical bed at LUL, possibly post operation." 
blob_sentence = TextBlob(sentence2)
noun = blob_sentence.noun_phrases
df1 = pd.DataFrame(noun)
comorbidity_keywords = ["segmentectomy","lobectomy"]
matches =[]
for comorbidity_keywords[0] in df1:
    if comorbidity_keywords[0] in df1 and comorbidity_keywords[0] not in matches:
       matches.append(comorbidity_keywords)

This gives me the result as the string that is not an actual match. The output should be "segmentectomy". But I get [0,'lobectomy']. Please Help!!. I have tried to take help from the answer posted here. Check if multiple strings exist in another string Please help to find out what am I doing incorrectly?

Begin with fixing `for comorbidity_keywords[0] in df1:` - you're essentially iterating over your `DataFrame` storing each row as the first element of your `comorbidity_keywords` list. Replace that line with something like `for keyword in comorbidity_keywords:` and then use `keyword` instead of `comorbidity_keywords[0]` in your `if...` check. — zwer, Mar 10 '19 at 08:53
@zwer Edited like this ` matches =[] ` for keyword in comorbidity_keywords: if keyword in df1 and keyword not in matches: matches.append(keyword) But this gives empty results — khushbu, Mar 10 '19 at 10:22

Mark Moretto · Accepted Answer · 2019-03-10T13:10:24.907

I don't really use TextBlob, but I have two methods that might help you get to your goal. Essentially, I'm splitting the sentence by a whitespace and iterating through that to see if there are any matches. One method returns a list and the other a dictionary of index values and the word.

### If you just want a list of words
def find_keyword_matches(sentence, keyword_list):
    s1 = sentence.split(' ')
    return [i for i in  s1 if i in keyword_list]

Then:

find_keyword_matches(sentence2, comorbidity_keywords)

Output:

['segmentectomy']

For a dictionary:

def find_keyword_matches(sentence, keyword_list):
    s1 = sentence.split(' ')
    return {xyz.index(i):i for i in xyz if i in comorbidity_keywords}

Output:

{17: 'segmentectomy'}

Finally, an iterator that will also print where in the sentence a word is found, if at all:

def word_range(sentence, keyword):
    try:
        idx_start = sentence.index(keyword)
        idx_end = idx_start + len(keyword)
        print(f'Word \'{keyword}\' found within index range {idx_start} to {idx_end}')
        if idx_start > 0:
            return keyword
    except ValueError:
        pass

Then do a nested list comprehension to get rid of None values:

found_words = [x for x in [word_range(sentence2, i) for i in comorbidity_keywords] if not x is None]

What if my sentences are stored in a dataframe and I want to iterate over each sentence in a dataframe? Could you please help? — khushbu, Mar 11 '19 at 14:49
@pari Are your sentences in a single column or multiple columns? — Mark Moretto, Mar 12 '19 at 18:50

score 0 · Answer 2 · answered Mar 10 '19 at 12:26

There should be some more efficient way to do this. But this is what I have come up with using two for loops for two lists.

for ckeyword in comorbidity_keywords:
   for keyword in df1.values.tolist():
     if any(ckeyword in key for key in keyword):
        matches.append(ckeyword)

How to check if a string contains substring when both are stored in lists in python?

2 Answers2