python replace and sub not working with unicode character u"\u0092"

Question

Python Version: Python 3.6. I am trying to replace the Unicode character u"\u0092" (aka curly apostrophe) with a regular apostrophe.

I have tried all of the below:

    mystring = <some string with problem character>
    # option 1 
    mystring = mystring.replace(u"\u0092", u\"0027")
    # option 2 
    mystring = mystring.replace(u"\u0092", "'")
    # option 3
    mystring = re.sub('\u0092',u"\u0027", mystring)
    # option 4
    mystring = re.sub('\u0092',u"'", mystring)

None of the above updates the character in mystring. Other sub and replace operations are working - which makes me think it is either an issue with how I am using the Unicode characters, or an issue with this particular character.

Update: I have also tried the suggestion below neither of which work:

    mystring.decode("utf-8").replace(u"\u0092", u"\u0027").encode("utf-8")
    mystring.decode("utf-8").replace(u"\u2019", u"\u0027").encode("utf-8")

But it gives me the error: AttributeError: 'str' object has no attribute 'decode'

Just to Clarify: The IDE is not the core issue here. My question is why when I run replace or sub with a Unicode character and print the result does it not register - the character is still present in the string.

Possible duplicate of [How to replace unicode characters in string with something else python?](https://stackoverflow.com/questions/13093727/how-to-replace-unicode-characters-in-string-with-something-else-python) — wp78de, May 30 '18 at 16:03
str.decode("utf-8").replace(u"\u0092", u"\u0027").encode("utf-8") — wp78de, May 30 '18 at 16:06
Thanks for the suggestion - I saw this on the other question mentioned above but does it work for Python3? When I try it I get the error: AttributeError: 'str' object has no attribute 'decode' — Pamela Kelly, May 30 '18 at 16:21
all strings are unicode in python3. you don"t need all that folklore with `u`s everywhere and encoding. just `string.replace("’", "'")` (in fact, i assumed in my answer you were running python2) — bobrobbob, May 30 '18 at 16:30
I get this error if I try to use the character directly - with or without the prefix of the u: SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0x92 in position 0: invalid start byte — Pamela Kelly, May 30 '18 at 16:38
There's no need to be rude. The IDE is giving that error NOT when I use \u0092 but when I use the character itself directly (as in copy and paste the curly apostrophe from the console output). I'm guessing becuase it can't read it. I also tried the alternative code that you suggested and that didn't work either. — Pamela Kelly, May 31 '18 at 08:22

bobrobbob · Answer 1 · 2018-05-30T16:53:02.230

1

your code is wrong it's \u2019 for apostrophe (’). from wikipedia

U+0092 146 Private Use 2 PU2

that's why eclipse is not happy.

with the right code:

#_*_ coding: utf8 _*_
import re
string = u"dkfljglkdfjg’fgkljlf"
string = string.replace(u"’", u"'"))
string = string.replace(u"\u2019", u"\u0027")
string = re.sub(u'\u2019',u"\u0027", string)
string = re.sub(u'’',u"'", string)

all solutions work

and don't call your vars str

edited May 30 '18 at 16:53

answered May 30 '18 at 16:17

bobrobbob

1,251
11
21

The first part doesn't work for me because Eclipse doesn't recognise the character directly. And same issue with the second part - when I print the result it is still the same curly comma and fails comparison test. – Pamela Kelly May 30 '18 at 16:24
i never used eclipse but i'd be most surprised if it didn't recognize regular unicode chars – bobrobbob May 30 '18 at 16:29
Sorry - as I mentioned in my question I tried a couple of those and the additional ones also don't work... using the prefix of the u or not doesn't seem to make a difference – Pamela Kelly May 30 '18 at 16:39

python replace and sub not working with unicode character u"\u0092"

1 Answers1