The words of the "wordslist" and the text I'm searching are in Cyrillic. The text is coded in UTF-8 (as set in Notepad++). I need Python to match a word in the text and get everything after the word until a full-stop followed by new line.
EDIT
with open('C:\....txt', 'rb') as f:
wordslist = []
for line in f:
wordslist.append(line)
wordslist = map(str.strip, wordslist)
/EDIT
for i in wordslist:
print i #so far, so good, I get Cyrillic
wantedtext = re.findall(i+".*\.\r\n", open('C:\....txt', 'rb').read())
wantedtext = str(wantedtext)
print wantedtext
"Wantedtext" shows and saves as "\xd0\xb2" (etc.).
What I tried:
This question is different, because there is no variable involved: Convert bytes to a python string. Also, the solution from the chosen answer
wantedtext.decode('utf-8')
didn't work, the result was the same. The solution from here didn't help either.
EDIT: Revised code, returning "[]".
with io.open('C:....txt', 'r', encoding='utf-8') as f:
wordslist = f.read().splitlines()
for i in wordslist:
print i
with io.open('C:....txt', 'r', encoding='utf-8') as my_file:
my_file_test = my_file.read()
print my_file_test #works, prints cyrillic characters, but...
wantedtext = re.findall(i+".*\.\r\n", my_file_test)
wantedtext = str(wantedtext)
print wantedtext #returns []
(Added after a comment below: This code works if you erase \r from the regular expression.)