0

As a personal project I have become interested in building a primitive spellchecker in Python.

Here's the issue: I know how to capture individual words from a target document and compare them, and I'm relatively confident I could get a program to recognise minor differences to a word (such as 1 or 2 char differences from the supposed correct word).

However, I thought it would be interesting to have associated numbers stored against each key on the keyboard - for instance if the anomalous character was an 's' then the program would then be able to identify the characters directly adjacent to 's' on the keyboard (via a mathematic system), compare whether one of the adjacent characters to that key makes the word correct, and if so flag to the user "Did you mean: {suggested_word}?"

Hope that makes sense, here is an example of the primitive beginnings of a prototype:

from string import ascii_lowercase

def newInput():
"""take input from user to allow for later functions"""

    while True:
        #intialise a variable to hold character input from user
        key_press = None
        try:
            #take input from user and save to key_press as string
            key_press = str(input("Please enter ONLY ONE alphabetic character:"))

            #check input is alphabetic
            if key_press not in string.ascii_lowercase:
                 print("Your entry was not a recognised alphabetic character, please try again")
                 continue

            #check that only one character was entered
            elif len(key_press) > 1 | len(key_press) <= 0:
                print("A single character not entered, restarting")
                continue

            else:
                return key_press

        except TypeError:
            print("The format of the data you entered was incorrect")

# associating input character with a number for key reference in adj_chars
key_ref = {"q" : 1, "w" : 2, "e" : 3, "r" : 4, "t" : 5, "y" : 6, "u" : 7, "i" : 8, "o" : 9, "p" : 10}

#maps associated keys by number reference
adj_chars = {"1": (2, 4), "2" : (21, 2, 5, 7, 8), "3" : (4, 5, 8, 10, 11), "4" : (7, 8, 11, 13, 14), "5" : (10, 11, 14, 16, 17)}

In this prototype an example association would be "w" has a key_ref value of 2. 2 exists in the adj_chars tuple of "1" (which corresponds to 1 in key_ref). Therefore 1 is related to 2.

This example would work but I'd like to find a more efficient way than mapping all the numbers individually

Scott Anderson
  • 631
  • 5
  • 26
  • You probably want to get the keycode that generated the character (see https://stackoverflow.com/questions/575650/how-to-obtain-the-keycodes-in-python), which can vary depending on the keyboard layout used. Typing `a` instead of `s` might be a common typo on a QWERTY keyboard, but not so much on a Dvorak keyboard, for example. A keyboard layout basically *is* the mapping you are looking for, though in the reverse direction. (You want character-to-keycode, not keycode-to-character.) – chepner Nov 25 '17 at 17:00
  • You might consider creating a two dimensional array and base "nearness" on the sum of the differences in the two arrays. Something along those lines? Of course, for only 26^2 combinations, you could just create a hash of differences (for speed) once you've computed it. –  Nov 26 '17 at 17:34
  • Thankyou for the responses - I hadn't considered taking the direct input code rather than using the character chepner. Also I can't reply directly to you directly Barry, but I think I understand what you mean - is it possible to give a small segment of the 2 dimensional arrays? – Scott Anderson Nov 28 '17 at 23:35

0 Answers0