I would like to count the frequency of three words preceding and following a specific word from a text file which has been converted into tokens.
from nltk.tokenize import sent_tokenize
from nltk.tokenize import word_tokenize
from nltk.util import ngrams
with open('dracula.txt', 'r', encoding="ISO-8859-1") as textfile:
text_data = textfile.read().replace('\n', ' ').lower()
tokens = nltk.word_tokenize(text_data)
text = nltk.Text(tokens)
grams = nltk.ngrams(tokens, 4)
freq = Counter(grams)
freq.most_common(20)
I don't know how to search for the string 'dracula' as a filter word. I also tried:
text.collocations(num=100)
text.concordance('dracula')
The desired output would look something like this with counts: Three words preceding 'dracula', sorted count
(('and', 'he', 'saw', 'dracula'), 4),
(('one', 'cannot', 'see', 'dracula'), 2)
Three words following 'dracula', sorted count
(('dracula', 'and', 'he', 'saw'), 4),
(('dracula', 'one', 'cannot', 'see'), 2)
The trigram containing 'dracula' in the middle, sorted count
(('count', 'dracula', 'saw'), 4),
(('count', 'dracula', 'cannot'), 2)
Thank you in advance for any help.