I want my code to be able to split a text file into single and double character n-grams. For example, if the word 'dogs' came up, I would want 'do','og', and 'gs'. the problem is I can only seem to split the text into whole words.
I tried to use just a simple split() but that didn't seem to work for overlapping n-grams.
from collections import Counter
from nltk.util import ngrams
def ngram_dist(fname, n):
with open(fname, 'r') as fp:
for lines in fp:
for words in lines:
result = Counter(ngrams(fname.split(),n))
return result