I want to make a python script that uses a regular expression to filter the lines that have certain greek words out of a source text which I provided and then write those lines to 3 different files depending on the words encountered.
Here is my code so far:
import regex
source=open('source.txt', 'r')
oti=open('results_oti.txt', 'w')
tis=open('results_tis.txt', 'w')
ton=open('results_ton.txt', 'w')
regex_oti='^.*\b(ότι|ό,τι)\b.*$'
regex_tis='^.*\b(της|τις)\b.*$'
regex_ton='^.*\b(τον|των)\b.*$'
for line in source.readlines():
if regex.match(regex_oti, line):
oti.write(line)
if regex.match(regex_tis, line):
tis.write(line)
if regex.match(regex_ton, line):
ton.write(line)
source.close()
oti.close()
tis.close()
ton.close()
quit()
The words that I check for are ότι | ό,τι | της | τις | τον | των
.
The problem is that those 3 regular expressions (regex_oti
, regex_tis
, regex_ton
) do not match anything so the 3 text files I created do not contain anything.
Maybe its an encoding problem (Unicode)?