I am currently developing an application relating to unicode characters.
As the unicode characters would have to be read in python to determine the language before passing on to Java for processing. However, currently I am reading the file first using python to determine the language before calling upon the corresponding Java engine to process it.
This method takes too long as there is too much I/O cost involved, but directly passing the unicode characters as an argument does not work, it throws an error:
'charmap' codec cant encode characters in position xx - xx: character maps to <undefined>.
What I would like to do (excerpt of my code):
#reads in the unicode char
str = "some unicode words"
command = "java -jar unicodeProcessor.jar " + str
subprocess.Popen(command, stdout = PIPE, stderr = PIPE)
Java processes it and writes it to a file.
Currently,
#determines what is the language.
filepath = "filepath of text file"
command = "java -jar unicodeProcessor.jar " + filepath
subprocess.Popen(command, stdout = PIPE, stderr = PIPE)
#in this method I am taking the parameter to be a file instead of a string
This method is too slow.
Current code :
unic = open("unicode_words.txt")
words = unic.read()
if ininstance(words, str):
convert = unicode(words, 'utf-8')
else:
convert = words
command = "java -jar unicodeProcessor.jar " + convert
subprocess.Popen(command, stdout = PIPE, stderr = PIPE)