1

I have a string contains Unicode characters and I want to convert it to UTF-8 in python.

s = '\u0628\u06cc\u0633\u06a9\u0648\u06cc\u062a'

I want convert s to UTF format.

Javad Karimi
  • 13
  • 1
  • 7
  • 4
    Possible duplicate of [How to convert a string to utf-8 in Python](https://stackoverflow.com/questions/4182603/how-to-convert-a-string-to-utf-8-in-python) – GadaaDhaariGeek Jul 02 '19 at 12:26

2 Answers2

1

Add u as prefix for the string s then encode it in utf-8.

Your code will look like this:

s = u'\u0628\u06cc\u0633\u06a9\u0648\u06cc\u062a'
s_encoded = s.encode('utf-8')
print(s_encoded)

I hope this helps.

GadaaDhaariGeek
  • 971
  • 1
  • 14
  • 33
  • 2
    If the OP is using Python 3 (it seems so), then the `u` prefix isn't necessary. But the `.encode('utf8')` is definitely right. – lenz Jul 02 '19 at 17:59
0

Add the below line in the top of your .py file.

# -*- coding: utf-8 -*-

It allows you to encode strings directly in your python script, like this:

# -*- coding: utf-8 -*-
s = '\u0628\u06cc\u0633\u06a9\u0648\u06cc\u062a'
print(s)

Output :

بیسکویت 
Usman
  • 1,983
  • 15
  • 28
  • 1
    The source encoding declaration doesn't really apply here, because the string is entered with ASCII-only characters. It would be different if the string literal was actually composed of Arabic letters (not escape sequences). – lenz Jul 02 '19 at 17:55
  • A coding line declares the encoding of the *source file* only. If you have only ASCII characters in the source (as above) it does nothing. In fact, in Python 3, UTF-8 is the default source encoding if undeclared. – Mark Tolonen Jul 03 '19 at 06:06