0

I am trying to print some text with emojis from this form text = "\\ud83d\\ude04\\n\\u3082\\u3042", into:

# my expecting output
# a new line after the emoji, then is Japanese character
>>>
もあ

I have read a question about this, but just solve part of the problem:

Best and clean way to Encode Emojis (Python) from text file

I followed the code mentioned in the post, and I got below result:

emoji_text = "\\ud83d\\ude04\\n\\u3082\\u3042".encode("latin_1")
output = (emoji_text
  .decode("raw_unicode_escape")
  .encode('utf-16', 'surrogatepass')
  .decode('utf-16')
)
print(output)

>>>\nもあ
# it prints \n instead of a new line

Therefore, I would like to ask, how can I convert the escape sequences \n, \t, \b etc. while converting the emoji and text?

Hui Gordon
  • 345
  • 1
  • 3
  • 13

1 Answers1

1

Using unicode_escape instead of raw_unicode_escape will decode the \n as well. Though if there is a reason you used raw_unicode_escape in the first place, perhaps this will not be suitable?

Your choice to encode into "latin-1" is vaguely odd, but perhaps there is a reason for that, too. Perhaps you should encode into "ascii" and be prepared to cope with any possible fallout.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • I replace it with `unicode_escape`, it works! Actually, the reason of using `raw_unicode_escape` and `latin-1`, is just because I read it from the above post. Thus, I will change `latin-1` to `ascii` as well if `ascii` is recommended. I am confused with text encoding. – Hui Gordon Mar 18 '21 at 08:15
  • The [Stack Overflow `character-encoding` tag info page](/tags/character-encoding/info) has some background and links to more resources. For Python, probably also read Ned Batchelder's https://nedbatchelder.com/text/unipain.html – tripleee Mar 18 '21 at 08:17
  • Great, it will help a lot. – Hui Gordon Mar 18 '21 at 11:16