0

I have a certain text from one line, for example: 'АРВТРВТПЛЯЖАОВР'. The word 'ПЛЯЖ' is hidden in it. There is also a list of all Russian words in all declensions. About 1.5 million words. I want to set a loop that iterates through all possible options for slicing the initial line and compares it with the values in the list. If it matches, it prints a match.

To solve the problem, I wrote the following code.

rus_words = open('russian.txt') #opening a file in read mode

text = 'АРВТРВТПЛЯЖАОВР' #Initial line

length_of_text = len(text)+1 #Text length


for line in rus_words: #Iterating through the values in the file
    for i in range(length_of_text): #Iterating through the row indexes
        for j in range(1,11): #Iterating over the possible length of a word 
                              #(Here I assume that the word is no more than 10 characters)
            maybe_word = text.lower()[i:i+j] #Formation of a possible word
            if maybe_word in line: #Comparison of the received word with the values in the list
                print(maybe_word) #Output of matches
               

As a result: the endless process of printing words with a length of no more than 3 characters begins.

I assume that the problem is either in reading the file or in the loop. The first option is more likely, but what is the problem is not entirely clear https://github.com/danakt/russian-words

Yaakov Bressler
  • 9,056
  • 2
  • 45
  • 69
SasambaDio
  • 23
  • 6
  • Did you mean to find all word in the russian words that inside the text `'АРВТРВТПЛЯЖАОВР'`. For example: `АРВТ`, `ВТРВ` (assumed these words are in russian.txt)? – dinhit May 17 '23 at 09:19

1 Answers1

1

There is a better way to do it, use the in operator.

The in keyword is used to check if a value is present in a sequence (list, range, string etc.). [1]

>>> 'ПЛЯЖ' in 'АРВТРВТПЛЯЖАОВР'
True

You can simply loop through the russian words list, do some text processing like .strip() or .lower() depends on your need.

For example:

rus_words = open('russian.txt', encoding='windows-1251') # in the russian words github repo it uses windows-1251 encoding
text = 'АРВТРВТПЛЯЖАОВР'
for line in rus_words:
    if line.strip() in text:
        print(line)
dinhit
  • 672
  • 1
  • 17