I want to write a program in python that writes a string to a file in a manner that a java application can read it. This java application reads the string in modified UTF-8 format (see this question). It basically means that all unicode characters have to be written as surrogate characters aka UTF-16BE. Is there a graceful, effective method that can do this.
I have tried the following to do so:
import re;
o_string = b''
for splitter in re.split("([\U00010000-\U0010FFFF])", w_string):
if len(splitter) == 1 and ord(splitter) > 0xFFFF:
o_string += splitter.encode("UTF-16")
else:
o_string += splitter.decode("UTF-8")
print(o_string)
But this seems so bulky (also it uses regex which seems to be en easy to dodge overhead). Can someone propose a better solution to this?