0

Is it possible to convert a string to unicode characters in Python?

For example: Hello => \u0048\u0045\u004C\u004C\u004F

Bravi
  • 713
  • 2
  • 8
  • 29

1 Answers1

1

You can't. A string already consists of 'Unicode characters'. What you want to change is not its representation but its actual contents.

That, fortunately, is as simple as

import re
text = 'Hello'
print (re.sub('.', lambda x: r'\u%04X' % ord(x.group()), text))

which outputs

\u0048\u0065\u006C\u006C\u006F
Jongware
  • 22,200
  • 8
  • 54
  • 100
  • Thank you! What is this conversion called? I've been googling for almost an hour now. I would assume this has already been asked and been answered somewhere. – Bravi Apr 10 '20 at 13:35
  • It's a regular expression that replaces each character with its own Unicode string through a [custom replacement function](https://stackoverflow.com/questions/18737863/passing-a-function-to-re-sub-in-python). But there are several other approaches possible – this is only one. The basic idea is that each character gets replaced with another string. – Jongware Apr 10 '20 at 13:50
  • I know it's a regular expression and it's using a custom replacement function, I was just asking about converting character H to \u0048.. – Bravi Apr 10 '20 at 14:02
  • 2
    `print(''.join(r'\u{:04X}'.format(ord(char)) for char in 'Hello'))` – furas Apr 10 '20 at 14:55
  • @Bravi: you mean this: [`ord` returns an integer representing the Unicode code point of that character](https://docs.python.org/3/library/functions.html#ord). This is converted to hex, that's all. – Jongware Apr 10 '20 at 15:53
  • Thank you, that's what I was looking for @usr2564301. :) – Bravi Apr 10 '20 at 17:12
  • 1
    @furas I like that version too, looks pythonic. :) – Bravi Apr 10 '20 at 17:12