I think I may be fundamentally confused about something in python or nltk. I'm generating a list of tokens from a paper abstract, and attempting to see if a search word is contained by the tokens. I do know about concordance, but it doesn't work well with my intended use of the comparison.
Here is my code:
def tokenize(text):
tokens = nltk.word_tokenize(text.get_text())
return tokens
def search_abstract_single_word(tokens, keyword):
match = 0
for token in tokens:
if token == keyword:
match += 1
return match
def search_file_single_word(abstract_list, keyword):
matches = list()
for item in abstract_list:
tokens = tokenize(item)
match = search_abstract_single_word(tokens, keyword)
matches.append(match)
return matches
I've confirmed that the tokens and keyword being passed in are correct, but match (and thus the entire list of matches) always evaluates zero. I was under the understanding word_tokenize returns an array of strings, so I don't see why, for example, when token = computer and keyword = computer, token == keyword does not return true and increment match.
EDIT: In a standalone class/main method this code does appear to work. However, the code is being called from a tkinter window like so:
self.keyword = ""
....
self.keywords_box = Text(self.Frame2)
....
self.Submit = Button(master)
self.Submit.configure(command=self.submit)
....
#triggered by submit button
def submit(self):
self.keywords += self.keywords_box.get("1.0", END)
#triggered by run button after keyword saved
def run(self):
search_input = self.keywords
....
#use pandas to read excel file, create abstracts, and store
....
matches = search_file_single_word(abstract_list, search_input)
for match in matches:
self.output_box.insert(END, match)
self.output_box.insert(END, '\n')
I had assumed because print(keyword) was outputting correctly if I inserted it into search_file_single_word, that the value was passed correctly, but is it actually just passing the tkinter property along and refusing to evaluate it vs the token?