1

I want to write a program in python that writes a string to a file in a manner that a java application can read it. This java application reads the string in modified UTF-8 format (see this question). It basically means that all unicode characters have to be written as surrogate characters aka UTF-16BE. Is there a graceful, effective method that can do this.

I have tried the following to do so:

import re;
o_string = b''
for splitter in re.split("([\U00010000-\U0010FFFF])", w_string):
    if len(splitter) == 1 and ord(splitter) > 0xFFFF:
        o_string += splitter.encode("UTF-16")
    else:
        o_string += splitter.decode("UTF-8")
print(o_string)

But this seems so bulky (also it uses regex which seems to be en easy to dodge overhead). Can someone propose a better solution to this?

Community
  • 1
  • 1
WorldSEnder
  • 4,875
  • 2
  • 28
  • 64
  • 2
    Does [this question](http://stackoverflow.com/questions/1393004/java-modified-utf-8-strings-in-python) help? (ie, this looks like a duplicate) – Norman Gray Jul 31 '14 at 16:02

0 Answers0