I have attempted to use a dictionary to replace Cyrillic words from a Unicode txt file. I wasn't expecting replacing words to be difficult, but when dealing with Cyrillic text, there is an added element of 16-bytes or 8-bytes that are an issue. I've tried many different codes, but none seem to work. I would really appreciate any help!
My dictionary is in a file called 'chars' and has things like:
cyrillic_ordinals = {
u'первый' : u'one',
u'второй' : u'two',
u'третий' : u'three',
u'четвёртый' : u'four' }
I'm not sure why my code isn't working. For context, the beginning of the code is the replacement definition (that has error) and the latter half of the code is just for specifying the input and output file.
import sys
import codecs
import os
import chars
def replaceordinals(text, cyrillic_ordinals):
for i, j in cyrillic_ordinals.iteritems():
text = text.replace(i, j)
return text
def readAndWrite(input_file, output_file):
try:
w_f = codecs.open(output_file, encoding='utf-8', mode='w+')
except IOError:
print("Can't create or edit output file. Do you have rights to create file here?")
print("For unix systems try to use \"sudo python\" instead of \"python\"")
try:
i_f = codecs.open(input_file, encoding='utf-8')
for line in i_f:
w_f.write(replaceordinals(line, chars.cyrillic_ordinals))
except IOError:
print("Can't read input file. Check your path to input file")
except:
try:
i_f = codecs.open(input_file, encoding='utf-16')
for line in i_f:
w_f.write(replaceordinals(line, chars.cyrillic_ordinals))
except IOError:
print("Can't read input file. Check your path to input file")
def main(argv):
#If user didn't provide path to input and/or output file - show an error, otherwise - try to run processing
if len(argv) != 3:
print("Missing file arguments.\nFormat: python " + argv[0] + " /home/user/Desktop/input_file.txt /home/user/Desktop/output_file.txt")
else:
readAndWrite(argv[1], argv[2])
if __name__ == "__main__":
main(sys.argv)
The output file that is created does not change and the Cyrillic text is not replaced by one, two, etc. Does anyone know how to fix this?