I am trying to look for keywords in sentences which is stored as a list of lists. The outer list contains sentences and the inner list contains words in sentences. I want to iterate over each word in each sentence to look for keywords defined and return me the values where found.
This is how my token_sentences looks like.
I took help from this post. How to iterate through a list of lists in python? However, I am getting an empty list in return.
This is the code I have written.
import nltk
from nltk.tokenize import TweetTokenizer, sent_tokenize, word_tokenize
text = "MDCT SCAN OF THE CHEST: HISTORY: Follow-up LUL nodule. TECHNIQUES: Non-enhanced and contrast-enhanced MDCT scans were performed with a slice thickness of 2 mm. COMPARISON: Chest CT dated on 01/05/2018, 05/02/207, 28/09/2016, 25/02/2016, and 21/11/2015. FINDINGS: Lung parenchyma: There is further increased size and solid component of part-solid nodule associated with internal bubbly lucency and pleural tagging at apicoposterior segment of the LUL (SE 3; IM 38-50), now measuring about 2.9x1.7 cm in greatest transaxial dimension (previously size 2.5x1.3 cm in 2015). Also further increased size of two ground-glass nodules at apicoposterior segment of the LUL (SE 3; IM 37), and superior segment of the LLL (SE 3; IM 58), now measuring about 1 cm (previously size 0.4 cm in 2015), and 1.1 cm (previously size 0.7 cm in 2015) in greatest transaxial dimension, respectively."
tokenizer_words = TweetTokenizer()
tokens_sentences = [tokenizer_words.tokenize(t) for t in
nltk.sent_tokenize(text)]
nodule_keywords = ["nodules","nodule"]
count_nodule =[]
def GetNodule(sentence, keyword_list):
s1 = sentence.split(' ')
return [i for i in s1 if i in keyword_list]
for sub_list in tokens_sentences:
result_calcified_nod = GetNodule(sub_list[0], nodule_keywords)
count_nodule.append(result_calcified_nod)
However, I am getting the empty list as a result for the variable in count_nodule.
This is the value of first two rows of "token_sentences".
token_sentences = [['MDCT', 'SCAN', 'OF', 'THE', 'CHEST', ':', 'HISTORY', ':', 'Follow-up', 'LUL', 'nodule', '.'],['TECHNIQUES', ':', 'Non-enhanced', 'and', 'contrast-enhanced', 'MDCT', 'scans', 'were', 'performed', 'with', 'a', 'slice', 'thickness', 'of', '2', 'mm', '.']]
Please help me to figure out where I am doing wrong!