How can i improve this guessing algorithm?

Question

I am trying to do a program that guess the word the user think, but right now the program is based only on elimination. Does anyone have an idea on how to make it better?

Here is a brief explanation on how it works now:

I have a list of words stored in "palavras.txt", these words are then transformed into a regular list.

First question is: "How much letters do your word have?". Based on that the program proceed to eliminate all the others words who do not have the same amount of letters. After that it creates a list that contains all the letters organized by the number of times they appear in the given position.

Then we have the second question: "Is the letter "x" the first letter of your word?". If the response is "not" it deletes all the words that contains that letter in that position, then goes to the second letter most used in that position and so on and so on. If yes it deletes all the words that doesn't contain that letter in that specific position and goes to the next letter of the word. And so on until the word is finished.

It works all the times, but sometimes it takes quite a lot of times. Is there a better way do it? AI? Machine learning maybe?

The code is not important since i'm just searching for ideas, but if anyone is curious here is how i did it:

import os
from unicodedata import normalize
import random
import string

# Define a função que retira os acentos das palavras
def remover_pont(txt):
    import string
    return txt.translate(str.maketrans('', '', string.punctuation))

def remover_acentos(txt):
    return normalize('NFKD', txt).encode('ASCII', 'ignore').decode('ASCII')

# Retorna uma lista com as letras mais usadas naquela posição, em ordem
def letramusada(lista, pletra):
    pletraordem = []
    pletraordem2 = []
    pl = []

    for n in lista:
        try:
            pl.append(n[pletra - 1])
        except:
            pass

    dict = {}
    for k in pl:
        if k in dict:
            dict[k] += 1
        else:
            dict[k] = 1
    pletraordem2 = (sorted(dict.items(), key=lambda t: t[1], reverse=True))

    for c in pletraordem2:
        pletraordem.append(c[0])

    return pletraordem

# Lê o "banco de dados" que contém as palavras e as armazena na variável "palavras", sem acentos
file = open('palavras.txt')
palavras = file.read().split("\n")

# Armazena a quantidade de letras que a palavra pensada tem
nletras = int(input('Digite o número de letras da palavra (considerando hífen, caso haja) que você pensou, com máximo de 8: '))

# Declara listas que serão usadas em seguida
npalavras = []
palavras2 = []
palavras3 = []
# Armazena todas as palavras que contém a quantidade de letras escolhida anteriormente em uma nova lista chamada "nletras", desconsiderando pontos

for n in palavras:
    if nletras == len(n):
        npalavras.append(remover_acentos(n).lower())

c = 0
n = 0

for k in range(1, nletras + 1):
    ordem = letramusada(npalavras, k)
    cond = 0
    try:
        while cond == 0:
            if  len(npalavras) < 20 and c == 0:
                print("\nHmmm, estou chegando perto!\n")
                c += 1
            if len(npalavras) < 3:
                break
            for c in ordem:
                if c != 0:
                    r = str(input("A {} letra da sua palavra é a letra \"{}\"? [S/N] ".format(k, c))).lower()
                    r = r[0]
                    if r == "s":
                        for n in npalavras:
                            if n[k-1] == c:
                                palavras2.append(n)
                        npalavras.clear()
                        npalavras = palavras2[:]
                        palavras2.clear()
                        ordem.clear()
                        cond += 1
                        break
                    else:
                        for n in npalavras:
                            if n[k-1] != c:
                                palavras2.append(n)
                        npalavras.clear()
                        npalavras = palavras2[:]
                        palavras2.clear()
                        r = 0
                        pass
    except:
        n = 1
        print("\nDesculpe, não achei nenhuma palavra :(")

escolha = random.choice(npalavras)

if n != 0:
    print("\nA palavra que você pensou é: \"{}\"".format(escolha))

Consider posting this question in [Code Review section](https://codereview.stackexchange.com/) of Stack Exchange — buran, May 31 '19 at 06:09
It is taking more time as you are doing a lot of scanning for each question and word list can be very high, code review community will be a good place to discuss these stuffs as this is out of scope for stackoverflow community — Ravi, May 31 '19 at 06:20
Sorry guys, this is my first question and I was not sure where to post it. Should I exclude this one? — Jônathas Raskólnikov, May 31 '19 at 06:36
Does my answer look anything not clear to you? Give me feedback so I can improve. Thank you. — knh190, May 31 '19 at 08:13

score 0 · Answer 1 · answered May 31 '19 at 06:24

you could store the words that have already been used, like say The first user used the word 'carro', then you could add that to a file, and after a few letters the program could check the list for already said words see if the word matches the description given i.e.: "has a c as first letter", and ask the next user if "carro" is their word, you could improve this further by adding a counter to each word, so that words that are more used appear on top of words that are less used.

knh190 · Accepted Answer · 2019-05-31T07:22:44.767

The Brute Force Be Your Friend

People may think "machine learning" is a silver bullet, but, what to learn? Especially when there's little information provided. What can you optimize? Your description sounds like a pure brute-force dictionary based password cracking, and hackers living in today are utilizing the power of GPU for that.

This may be a little off topic but even given a GPU the search can be hard. If you are not constrained to specific language / platform, the above link to hashcat is useful. The famous 133 MB dictionary can be enumerated in 5 minutes on a MacBookPro, which is way more powerful than guessing in Python.

The Search Space And Word Patterns

Also an average length for English words is about 8, this situation is really similar with a typical password. i.e. your search space is large - the upperbound is 26^8 = 208827064576 words! - except that player can only use a limited word list in the game.

The actual search space can be a little bit smaller since there are patterns in English words (like s is the most frequent alphabet and ae, as can appear more frequently than az things), but you are using a dictionary, so I don't think this can help.

The Non Dictionary Approach

And another idea is that the process can be quite close to recover a DNA sequence, which also has some patterns but the give information may vary. Think it as a word suggestion. Bioinfomatics uses the probabilistic patterns in DNA sequence for imputation.

This method can help when you can progressively guess the word / sequence. Otherwise, you can only use a brute force approach (when your word can only be recovered from a hash).

A classic method used for search engines, input methods and DNA imputation is hidden markov model. It guesses the next character based on your previous input, and the probability is a statistic value pre-calculated using real words.

This can be combined with dictionary to sort your suggestion (guess) and provide more accurate guessing.

score 0 · Answer 3 · answered Jun 02 '19 at 20:04

0

There's another post that talks about word suggesting algorithm it even has the python code for it. Here's the link What algorithm gives suggestions in a spell checker?

answered Jun 02 '19 at 20:04

TripleL

19
4

How can i improve this guessing algorithm?

3 Answers3