The code on the geeksforgeeks is kinda outdated and lack a full working example =(
Lets walkthrough the code and go step-by-step instead of having some copy+paste solve it answer!
Download the data/model dependencies
import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('reuters')
Import the modules necessary to pre-rpcess the data from Reuters
import string
import random
from nltk.corpus import stopwords
# write the removal characters such as : Stopwords and punctuation
stop_words = set(stopwords.words('english'))
string.punctuation = string.punctuation +'"'+'"'+'-'+'''+'''+'—'
removal_list = list(stop_words) + list(string.punctuation)+ ['lt','rt']
Read the Reuter corpus and collect the n-grams
The geeksforgeeks code hardcoded the ngrams, but there is a cool everygrams
feature https://stackoverflow.com/a/54177775/610569:
from nltk.corpus import reuters
from nltk import FreqDist, ngrams, everygrams
sents = reuters.sents()[:30]
one_two_three_grams = everygrams(sents, 1, 3, pad_left=True, pad_right=True)
# Cleans up and remove stop words.
one_two_three_grams = [ng for ng in one_two_three_grams if all(word for word in ng if word not in removal_list)]
Picking words from the salad
from itertools import chain
from nltk.corpus import reuters
from nltk import FreqDist, ngrams, everygrams
sents = reuters.sents()
one_to_four_ngrams = chain(*[everygrams(sent, 1, 4, pad_left=True, pad_right=True) for sent in sents])
one_to_four_ngrams = [ng for ng in one_to_four_ngrams if all(word for word in ng if word not in removal_list)]
# Keep a counter of ngrams.
word_salad = FreqDist(one_to_four_ngrams)
# Given an input "prompt" / prefix
prefix = 'it will'
# Check what's most possible to come next:
print([ng for ng in word_salad if ' '.join(ng).lower().startswith(prefix.lower())])
[out]:
[('it', 'will'), ('It', 'will'), ('it', 'will', 'impose'), ('it', 'will', 'impose', '300'), ('it', 'will', 'mean'), ('it', 'will', 'mean', 'the'), ('it', 'will', 'be'), ('it', 'will', 'be', 'extended'), ('it', 'will', 'have'), ('it', 'will', 'have', 'on'), ('it', 'will', 'establish'), ('it', 'will', 'establish', 'a'), ('it', 'will', 'vastly'), ('it', 'will', 'vastly', 'expand'), ('It', 'will', 'be'), ('It', 'will', 'be', 'the'), ('it', 'will', 'also'), ('it', 'will', 'also', 'open'), ('It', 'will', 'only'), ('It', 'will', 'only', 'disappear'), ('It', 'will', 'remain'), ('It', 'will', 'remain', 'very'), ('it', 'will', 'not'), ('it', 'will', 'not', 'allow'), ('it', 'will', 'withdraw'), ('it', 'will', 'withdraw', 'the'), ('it', 'will', 'concentrate'), ('it', 'will', 'concentrate', 'on')]
What about some probabilities?
See https://www.kaggle.com/code/alvations/n-gram-language-model-with-nltk