-2

My code takes a list of strings from a static website.

It then traverses through each character in the list and uses the .replace method to replace any non utf-8 character:

foo.replace('\\u2019', "'")

It doesn't replace the character in the list correctly and ends up looking like the following:

before

u'What\u2019s with the adverts?'

after

u'What\u2019s with the adverts?'

Why is it

  • I've updated my answer, you might want to read it again, and accept it if you find it helpful... :)... – user234461 Apr 30 '18 at 15:49
  • Does this answer your question? [Why doesn't calling a string method do anything unless its output is assigned?](https://stackoverflow.com/questions/9189172/why-doesnt-calling-a-string-method-do-anything-unless-its-output-is-assigned) – Karl Knechtel Aug 06 '22 at 01:56

2 Answers2

1

Python 2.7 interprets string literals as ASCII, not unicode, and so even though you've tried to include unicode characters in your argument to foo.replace, replace is just seeing ASCII {'\', 'u', '2', '0', '1', '9'}. This is because Python doesn't assign a special meaning to "\u" unless it is parsing a unicode literal.

To tell Python 2.7 that this is a unicode string, you have to prefix the string with a u, as in foo.replace(u'\u2017', "'").

Additionally, in order to indicate the start of a unicode code, you need \u, not \\u - the latter indicates that you want an actual '\' in the string followed by a 'u'.

Finally, note that foo will not change as a result of calling replace. Instead, replace will return a value which you must assign to a new variable, like this:

bar = foo.replace(u'\u2017', "'")
print bar

(see stackoverflow.com/q/26943256/4909087)

user234461
  • 1,133
  • 12
  • 29
  • Not the downvoter, but I imagine you would also need to make sure they assign it back (I can see that being another potential problem with OP given their lack of knowledge of unicode strings). – cs95 Apr 30 '18 at 15:23
  • @cᴏʟᴅsᴘᴇᴇᴅ I don't know what you mean by "assign it back". Assign the return value to a variable? They're clearly seeing the return values or else they wouldn't have included them in the answer. – user234461 Apr 30 '18 at 15:37
  • 1
    Easier if I show you what could happen: https://stackoverflow.com/q/26943256/4909087 – cs95 Apr 30 '18 at 15:40
  • @cᴏʟᴅsᴘᴇᴇᴅ Ah, okay, so you think he's doing something like this: `foo.replace('\\u2019', "'") ; print foo;`. Wouldn't have thought of that; thank you! – user234461 Apr 30 '18 at 15:46
-2

yeah. If your string is foo = r'What\u2019s with the adverts?' will ok with foo.replace('\\u2019', "'"). It is a raw string and begins with r''. And with u'' is Unicode. Hope to help you.

AJackTi
  • 63
  • 2
  • 7