How convert a string contain unicode characters to UTF in python?

Question

I have a string contains Unicode characters and I want to convert it to UTF-8 in python.

s = '\u0628\u06cc\u0633\u06a9\u0648\u06cc\u062a'

I want convert s to UTF format.

Possible duplicate of [How to convert a string to utf-8 in Python](https://stackoverflow.com/questions/4182603/how-to-convert-a-string-to-utf-8-in-python) — GadaaDhaariGeek, Jul 02 '19 at 12:26

score 1 · Accepted Answer · answered Jul 02 '19 at 12:25

1

Add u as prefix for the string s then encode it in utf-8.

Your code will look like this:

s = u'\u0628\u06cc\u0633\u06a9\u0648\u06cc\u062a'
s_encoded = s.encode('utf-8')
print(s_encoded)

I hope this helps.

answered Jul 02 '19 at 12:25

GadaaDhaariGeek

2

If the OP is using Python 3 (it seems so), then the `u` prefix isn't necessary. But the `.encode('utf8')` is definitely right. – lenz Jul 02 '19 at 17:59

score 0 · Answer 2 · answered Jul 02 '19 at 12:19

0

Add the below line in the top of your .py file.

# -*- coding: utf-8 -*-

It allows you to encode strings directly in your python script, like this:

# -*- coding: utf-8 -*-
s = '\u0628\u06cc\u0633\u06a9\u0648\u06cc\u062a'
print(s)

Output :

بیسکویت

answered Jul 02 '19 at 12:19

Usman

1

The source encoding declaration doesn't really apply here, because the string is entered with ASCII-only characters. It would be different if the string literal was actually composed of Arabic letters (not escape sequences). – lenz Jul 02 '19 at 17:55
A coding line declares the encoding of the *source file* only. If you have only ASCII characters in the source (as above) it does nothing. In fact, in Python 3, UTF-8 is the default source encoding if undeclared. – Mark Tolonen Jul 03 '19 at 06:06

2 Answers2