Convert string to unicode characters in python

Question

Is it possible to convert a string to unicode characters in Python?

For example: Hello => \u0048\u0045\u004C\u004C\u004F

score 1 · Accepted Answer · answered Apr 10 '20 at 13:30

1

You can't. A string already consists of 'Unicode characters'. What you want to change is not its representation but its actual contents.

That, fortunately, is as simple as

import re
text = 'Hello'
print (re.sub('.', lambda x: r'\u%04X' % ord(x.group()), text))

which outputs

\u0048\u0065\u006C\u006C\u006F

answered Apr 10 '20 at 13:30

Jongware

Thank you! What is this conversion called? I've been googling for almost an hour now. I would assume this has already been asked and been answered somewhere. – Bravi Apr 10 '20 at 13:35
It's a regular expression that replaces each character with its own Unicode string through a [custom replacement function](https://stackoverflow.com/questions/18737863/passing-a-function-to-re-sub-in-python). But there are several other approaches possible – this is only one. The basic idea is that each character gets replaced with another string. – Jongware Apr 10 '20 at 13:50
I know it's a regular expression and it's using a custom replacement function, I was just asking about converting character H to \u0048.. – Bravi Apr 10 '20 at 14:02
2

`print(''.join(r'\u{:04X}'.format(ord(char)) for char in 'Hello'))` – furas Apr 10 '20 at 14:55
@Bravi: you mean this: [`ord` returns an integer representing the Unicode code point of that character](https://docs.python.org/3/library/functions.html#ord). This is converted to hex, that's all. – Jongware Apr 10 '20 at 15:53
Thank you, that's what I was looking for @usr2564301. :) – Bravi Apr 10 '20 at 17:12
1

@furas I like that version too, looks pythonic. :) – Bravi Apr 10 '20 at 17:12

1 Answers1