-1

I have a string:

"123456789012"

Is it possible to print it in something like this?

print("\u1234\u5678\u9012")

using a function? ex. print_utf8(string)

  • 2
    The second string is not a UTF8 string. That's a string containing escape sequences. *All* Python 3 strings are Unicode strings already. This page is UTF8 which means I can write Αυτό εδώ without escaping anything and be sure it will appear just fine. `"123456789012"` itself is a UTF8 string – Panagiotis Kanavos Oct 12 '20 at 11:48
  • 1
    What are you trying to do? You don't need to use any special tricks to work with Unicode - you're already doing it. No special "encodings', no escape sequences – Panagiotis Kanavos Oct 12 '20 at 11:51

1 Answers1

0
# Split the string into chunks of length 4
In [1]: codepoints = ["1234", "5678", "9012"]

# Convert them into the `\u` format
In [2]: r'\u' + r'\u'.join(codepoints)                                          
Out[2]: '\\u1234\\u5678\\u9012'

# Decode
In [3]: _.encode().decode('unicode-escape')                                     
Out[3]: 'ሴ噸递'

Note that in Python 3, strings are already in Unicode. That's why you need to .encode() the string with Unicode escapes and then .decode() it. See decode(unicode_escape) in python 3 a string

ForceBru
  • 43,482
  • 10
  • 63
  • 98
  • Why not just use the correct string from the start? Why go to all this trouble? – Panagiotis Kanavos Oct 12 '20 at 11:52
  • @PanagiotisKanavos, no idea. Maybe it comes from C or some weird API that doesn't support Unicode. Depends on where OP's data comes from. – ForceBru Oct 12 '20 at 11:54
  • They all support UTF8 if they treat it as single-byte characters. In fact, that's how most Linux applications work - they implicitly assume everything is a char array whose encoding matches LC_ALL or the system locale. Which leads to most SO questions about UTF8, when people that set their *desktop* machines to their locale try to load data from a different locale, or UTF8 – Panagiotis Kanavos Oct 12 '20 at 11:58
  • 1
    Most of the problems are caused by people that try to "fix" what doesn't need fixing and use unnecessary encodings or escaping. – Panagiotis Kanavos Oct 12 '20 at 12:00