How to find a word from string at file if two letters are changed into 0 (Python 3)

Question

I dont have any code, because i actually have no idea how to solve this problem. So i'll be happy if you can help me come out with some algorithm or i don't know.

I have a list filled by letters and two 0's. Zero is coding two letters in the word. Somewhere in this list is a word from file. This file is filled by different czech words (pretty big one).

I need to find a word from file in list and decode zeroes letters in it

Example input:

['a', 't', '0', 'l', 'u', 'r', 'i', '0', 'r', 'x']

Example of data in file: (normally its 32000 words)

telepatech
telepatie
teleport
teleportovala
teleportovat
teleportujete
telepsychickou
teleskop
teleskopu
teletextem
teletina
teletou
televizi
tellur
telurid
tematicky
tematizace
temena

Desired Output:

telurid

Your question is too broad. Make sure to share what you've attempted so far as well as some reproducible code block and sample data file so somebody can provide some help here. — Giorgos Myrianthous, Nov 22 '19 at 18:01
I want find a word from file, it can be shorter than list, i want to ignore every other character — nuta_nu_, Nov 22 '19 at 18:15
That doesn't make any sense @AnnaSereda. How much of the word minimum needs to be matched for it to count? — Lordfirespeed, Nov 22 '19 at 18:27
Assuming the words are in a python array: you can just use a forloop, but your question needs work. DM me if you need personal assistance ... — Jay, Nov 22 '19 at 19:35
@Lordfirespeed as i know it always have just one solution, If there is the word that mutch to other word in file thats by all letters (exept two 0) thats the one. — nuta_nu_, Nov 22 '19 at 22:30
@Jay I'm new here and i'm not really sure how to DM to somebody.... But i suppose personal assistence is what i need... I was trying to solve this last few days and I still have no idea — nuta_nu_, Nov 22 '19 at 22:34
@Boris Honestly i Don't know, our teacher doesn't mention this — nuta_nu_, Nov 23 '19 at 14:00
@AnnaSereda ashiswin already told you the correct way to solve the problem. [Build a trie](https://stackoverflow.com/questions/11015320/how-to-create-a-trie-in-python) out of your list of words and then try to traverse the trie starting with each letter in the input. When you hit a zero you have to process all nodes at the trie position you're at instead of just the one that matches the next letter. — Boris Verkhovskiy, Nov 23 '19 at 19:31

score 0 · Accepted Answer · answered Nov 22 '19 at 18:19

The most optimal solution I can think of offhand would be to look into using a trie. You can take your list of words and put them into a trie. Then with your input, you simply traverse the trie while ignoring any '0's. Once you hit a leaf, you can return the word you got!

A bit of an introduction to Trie-s: https://medium.com/basecs/trying-to-understand-tries-3ec6bede0014

Lordfirespeed · Answer 2 · 2019-11-22T18:39:00.903

For a non-optimal solution, you could just iterate over the list of words.

word = "t0luri0rx"
zeroindexes = [i for i, c in enumerate(word) if c == "0"]
strippedword = word.replace("0", "")

with open("wordsfile.txt") as wordsfile:
    words = [line.strip().lower() for line in wordsfile.readlines()]

for checkword in words:
    strippedcheckword = checkword
    [del strippedcheckword[i] for i in zeroindexes]
    if strippedcheckword in strippedword:
        print(checkword)

You'd need to put wordsfile.txt into the same folder as the python program, unless you're willing to muck about with setting the working directory. Alternatively, if you sort out your question, and make it more clear about characters being omitted, etc. you could use regex (the re module) to efficiently find what you're looking for.

Regex would go a little like this:

import re
word = "t0luri0"  # removing the rx for sake of clarity
word.replace("0", "[a-z]")
pattern = re.compile(word, re.IGNORECASE)

with open("wordsfile.txt") as wordsfile:
    words = [line.strip() for line in wordsfile.readlines()]

for checkword in words:
    match = re.match(pattern, checkword)
    if match:
        print(match.group())

This solution, would, however, only match words such as telurid or tolurip (not a word, but it'd match if it were in the file). It wouldn't match things shorter or longer. I guess you could insert a few tokens into the regex expression to mitigate this.

It can be any other word so, its not really helpful ( But thank you for trying — nuta_nu_, Nov 22 '19 at 18:57

How to find a word from string at file if two letters are changed into 0 (Python 3)

2 Answers2