As a personal project I have become interested in building a primitive spellchecker in Python.
Here's the issue: I know how to capture individual words from a target document and compare them, and I'm relatively confident I could get a program to recognise minor differences to a word (such as 1 or 2 char differences from the supposed correct word).
However, I thought it would be interesting to have associated numbers stored against each key on the keyboard - for instance if the anomalous character was an 's' then the program would then be able to identify the characters directly adjacent to 's' on the keyboard (via a mathematic system), compare whether one of the adjacent characters to that key makes the word correct, and if so flag to the user "Did you mean: {suggested_word}?"
Hope that makes sense, here is an example of the primitive beginnings of a prototype:
from string import ascii_lowercase
def newInput():
"""take input from user to allow for later functions"""
while True:
#intialise a variable to hold character input from user
key_press = None
try:
#take input from user and save to key_press as string
key_press = str(input("Please enter ONLY ONE alphabetic character:"))
#check input is alphabetic
if key_press not in string.ascii_lowercase:
print("Your entry was not a recognised alphabetic character, please try again")
continue
#check that only one character was entered
elif len(key_press) > 1 | len(key_press) <= 0:
print("A single character not entered, restarting")
continue
else:
return key_press
except TypeError:
print("The format of the data you entered was incorrect")
# associating input character with a number for key reference in adj_chars
key_ref = {"q" : 1, "w" : 2, "e" : 3, "r" : 4, "t" : 5, "y" : 6, "u" : 7, "i" : 8, "o" : 9, "p" : 10}
#maps associated keys by number reference
adj_chars = {"1": (2, 4), "2" : (21, 2, 5, 7, 8), "3" : (4, 5, 8, 10, 11), "4" : (7, 8, 11, 13, 14), "5" : (10, 11, 14, 16, 17)}
In this prototype an example association would be "w" has a key_ref value of 2. 2 exists in the adj_chars tuple of "1" (which corresponds to 1 in key_ref). Therefore 1 is related to 2.
This example would work but I'd like to find a more efficient way than mapping all the numbers individually