utf-8 encoding and greek characters

Question

While I managed to get all the data that I need as well as save it on a cv file, the output I get is in UTF-8 format, which is normal(correct me If I'm wrong)

TBH I've already "played" with the .encode() and .decode() option without any results.

here is my code

brands=[name.text for name in Unibrands]

here is the output

u'Spirulina \u0395\u03bb\u03bb\u03b7\u03bd\u03b9\u03ba\u03ae'

And this is the desired output

u'Spirulina Ελληνική'

Basically, you’re looking at the `repr()` output of said string, where it’s normal that you get escape sequences for certain characters. If you `print()` the result as @宏杰李 suggested, then you will properly get your string output. — poke, Jan 04 '17 at 09:16
@宏杰李 I did that and the result is u'Spirulina \u0395\u03bb\u03bb\u03b7\u03bd\u03b9\u03ba\u03ae' — Volpym, Jan 04 '17 at 09:16
The `u'` prefix means that you’re still looking at the `repr()` output of said string instead of the string content. – Are you printing the string or the `brands` list? — poke, Jan 04 '17 at 09:17
In case it wasn't clear through the tags that I used for the question, I work with beautifoulSoup... — Volpym, Jan 04 '17 at 09:19

lvc · Accepted Answer · 2017-01-04T09:25:50.610

0

That string is already fine; you're seeing the repr of it, which does escape certain characters because this is intended to be safe to copy and paste directly into Python source code (which in Python 2.x means it needs to have only printable ASCII characters) - eg, \u0395 represents the codepoint U+0395 GREEK CAPITAL LETTER EPSILON. You're seeing this form of it because printing a list (or other container) always shows you the repr of its contents - if you instead print the string directly, you should see an appropriate glyph instead of the escaped form:

>>> print(u'Spirulina \u0395\u03bb\u03bb\u03b7\u03bd\u03b9\u03ba\u03ae')
>>> 'Spirulina Ελληνική'

You could also consider upgrading to a newer Python version; Python 3.5 (and possibly earlier 3.x versions) no longer escape these letters in the repr, since Python now accepts Unicode characters in source files by default.

edited Jan 04 '17 at 09:25

answered Jan 04 '17 at 09:19

lvc

34,233
10
73
98

I think that bs4 is not compitable with python 3.5 and later versions – Volpym Jan 04 '17 at 10:50
@volpym according to [its pypi page](https://pypi.python.org/pypi/beautifulsoup4/), bs4 does support 3.x. It doesn't matter that it only says 3.4, later 3.x versions are still compatible. – lvc Jan 04 '17 at 12:04
Well I upgraded to python 3.0 and started getting the desired output. Thus I marked your answer ;) – Volpym Jan 08 '17 at 22:04

utf-8 encoding and greek characters

1 Answers1