Character encoding in python to replace 'u2019' with '

Question

I have tried numerous ways to encode this to the end result "BACK RUSHIN'" with the most important character being the right apostrophe '.

I would like a way of getting to this end result using some of the built in functions Python has where there is no discrimination between a normal string and a unicode string.

This was the code I was using to retrieve the string: str(unicode(etree.tostring(root.xpath('path')[0],method='text', encoding='utf-8'),errors='ignore')).strip()

With the result being: 'BACK RUSHIN' the thing being the apostrophe ' is missing.

Another way was: root.xpath('path/text()')

And that result was: u'BACK RUSHIN\u2019' in python.

Lastly if I try: u'BACK RUSHIN\u2019'.encode('ascii', 'replace')

The result is: 'BACK RUSHIN?'

Please no replace functions, I would like to make use of pythons codec libraries. Also no printing the string because it is being held in a variable.

Thanks

So you want to read `’` (RIGHT SINGLE QUOTATION MARK) from the XML, but translate it to `'` (APOSTROPHE) ? — Robᵩ, Sep 19 '14 at 01:31
This is not a codec problem. As Rob implies, those are two entirely distinct characters. Turning one into the other is a matter of replacement, not encoding. The (misleadingly named) `unidecode` module is no more than a set of replacements from non-ASCII characters to sort-of-similar-looking ASCII ones, for desperate situations when you have to interface with systems that can't do Unicode. Otherwise it's usually a bad idea to mangle strings this way. — bobince, Sep 19 '14 at 04:50

score 15 · Accepted Answer · answered Sep 19 '14 at 01:18

15

>>> import unidecode
>>> unidecode.unidecode(u'BACK RUSHIN\u2019')
"BACK RUSHIN'"

unidecode

answered Sep 19 '14 at 01:18

Ignacio Vazquez-Abrams

776,304
153
1,341
1,358

7

might be worth mentioning that you have to install `unidecode`. – Padraic Cunningham Sep 19 '14 at 01:53

Character encoding in python to replace 'u2019' with '

1 Answers1

Linked

Related