I'm trying to write a function that returns a dictionary whose keys are pairs of words that appear consecutively in the input file and whose values are lists containing any word that has followed that pair in the file. Ex. Suppose the input file contained only the sentence "This clause is first, and this clause came second.
". The resulting dictionary should be: {(this, clause):[is, came], (clause, is):[first], (is, first):[and], (first, and):[this], (and, this):[clause], (clause, came):[second]}
.
import string
def predictive(text_file):
file = open(text_file, encoding='utf8')
text = file.read()
file.close()
punc = string.punctuation + '’”—⎬⎪“⎫1234567890'
new_text = text
for char in punc:
new_text = new_text.replace(char, '')
new_text = new_text.lower()
text_split = new_text.split()
print(text_split)
predictive('gatsby.txt')
I used The Great Gatsby as the text file and stripped away unnecessary punctuation and lowercased the words. I'm not sure what to do next to return what I am looking for but would really appreciate any suggestions or help guiding me in the right way. Thanks!