-1

I dont have any code, because i actually have no idea how to solve this problem. So i'll be happy if you can help me come out with some algorithm or i don't know.

I have a list filled by letters and two 0's. Zero is coding two letters in the word. Somewhere in this list is a word from file. This file is filled by different czech words (pretty big one).

I need to find a word from file in list and decode zeroes letters in it

Example input:

['a', 't', '0', 'l', 'u', 'r', 'i', '0', 'r', 'x']

Example of data in file: (normally its 32000 words)

  • telepatech
  • telepatie
  • teleport
  • teleportovala
  • teleportovat

  • teleportujete

  • telepsychickou
  • teleskop
  • teleskopu
  • teletextem
  • teletina
  • teletou
  • televizi
  • tellur
  • telurid
  • tematicky
  • tematizace
  • temena

Desired Output:

telurid
nuta_nu_
  • 25
  • 4
  • 1
    Your question is too broad. Make sure to share what you've attempted so far as well as some reproducible code block and sample data file so somebody can provide some help here. – Giorgos Myrianthous Nov 22 '19 at 18:01
  • 1
    The desired output shouldn't be teluridrx? – Duarte Castanho Nov 22 '19 at 18:02
  • no just telurid – nuta_nu_ Nov 22 '19 at 18:03
  • So you want to ignore the 2 last elements in the list? – Duarte Castanho Nov 22 '19 at 18:07
  • I want find a word from file, it can be shorter than list, i want to ignore every other character – nuta_nu_ Nov 22 '19 at 18:15
  • That doesn't make any sense @AnnaSereda. How much of the word minimum needs to be matched for it to count? – Lordfirespeed Nov 22 '19 at 18:27
  • Assuming the words are in a python array: you can just use a forloop, but your question needs work. DM me if you need personal assistance ... – Jay Nov 22 '19 at 19:35
  • @Lordfirespeed as i know it always have just one solution, If there is the word that mutch to other word in file thats by all letters (exept two 0) thats the one. – nuta_nu_ Nov 22 '19 at 22:30
  • @Jay I'm new here and i'm not really sure how to DM to somebody.... But i suppose personal assistence is what i need... I was trying to solve this last few days and I still have no idea – nuta_nu_ Nov 22 '19 at 22:34
  • How big can the input list be? – Boris Verkhovskiy Nov 22 '19 at 22:46
  • @Boris Honestly i Don't know, our teacher doesn't mention this – nuta_nu_ Nov 23 '19 at 14:00
  • @AnnaSereda ashiswin already told you the correct way to solve the problem. [Build a trie](https://stackoverflow.com/questions/11015320/how-to-create-a-trie-in-python) out of your list of words and then try to traverse the trie starting with each letter in the input. When you hit a zero you have to process all nodes at the trie position you're at instead of just the one that matches the next letter. – Boris Verkhovskiy Nov 23 '19 at 19:31

2 Answers2

0

The most optimal solution I can think of offhand would be to look into using a trie. You can take your list of words and put them into a trie. Then with your input, you simply traverse the trie while ignoring any '0's. Once you hit a leaf, you can return the word you got!

A bit of an introduction to Trie-s: https://medium.com/basecs/trying-to-understand-tries-3ec6bede0014

ashiswin
  • 637
  • 5
  • 11
0

For a non-optimal solution, you could just iterate over the list of words.

word = "t0luri0rx"
zeroindexes = [i for i, c in enumerate(word) if c == "0"]
strippedword = word.replace("0", "")

with open("wordsfile.txt") as wordsfile:
    words = [line.strip().lower() for line in wordsfile.readlines()]

for checkword in words:
    strippedcheckword = checkword
    [del strippedcheckword[i] for i in zeroindexes]
    if strippedcheckword in strippedword:
        print(checkword)

You'd need to put wordsfile.txt into the same folder as the python program, unless you're willing to muck about with setting the working directory. Alternatively, if you sort out your question, and make it more clear about characters being omitted, etc. you could use regex (the re module) to efficiently find what you're looking for.

Regex would go a little like this:

import re
word = "t0luri0"  # removing the rx for sake of clarity
word.replace("0", "[a-z]")
pattern = re.compile(word, re.IGNORECASE)

with open("wordsfile.txt") as wordsfile:
    words = [line.strip() for line in wordsfile.readlines()]

for checkword in words:
    match = re.match(pattern, checkword)
    if match:
        print(match.group())

This solution, would, however, only match words such as telurid or tolurip (not a word, but it'd match if it were in the file). It wouldn't match things shorter or longer. I guess you could insert a few tokens into the regex expression to mitigate this.