python .replace not working properly

Question

My code takes a list of strings from a static website.

It then traverses through each character in the list and uses the .replace method to replace any non utf-8 character:

foo.replace('\\u2019', "'")

It doesn't replace the character in the list correctly and ends up looking like the following:

before

u'What\u2019s with the adverts?'

after

u'What\u2019s with the adverts?'

Why is it

I've updated my answer, you might want to read it again, and accept it if you find it helpful... :)... — user234461, Apr 30 '18 at 15:49
Does this answer your question? [Why doesn't calling a string method do anything unless its output is assigned?](https://stackoverflow.com/questions/9189172/why-doesnt-calling-a-string-method-do-anything-unless-its-output-is-assigned) — Karl Knechtel, Aug 06 '22 at 01:56

user234461 · Answer 1 · 2018-04-30T15:49:09.057

1

Python 2.7 interprets string literals as ASCII, not unicode, and so even though you've tried to include unicode characters in your argument to foo.replace, replace is just seeing ASCII {'\', 'u', '2', '0', '1', '9'}. This is because Python doesn't assign a special meaning to "\u" unless it is parsing a unicode literal.

To tell Python 2.7 that this is a unicode string, you have to prefix the string with a u, as in foo.replace(u'\u2017', "'").

Additionally, in order to indicate the start of a unicode code, you need \u, not \\u - the latter indicates that you want an actual '\' in the string followed by a 'u'.

Finally, note that foo will not change as a result of calling replace. Instead, replace will return a value which you must assign to a new variable, like this:

bar = foo.replace(u'\u2017', "'")
print bar

(see stackoverflow.com/q/26943256/4909087)

edited Apr 30 '18 at 15:49

answered Apr 30 '18 at 15:18

user234461

1,133
12
29

Not the downvoter, but I imagine you would also need to make sure they assign it back (I can see that being another potential problem with OP given their lack of knowledge of unicode strings). – cs95 Apr 30 '18 at 15:23
@cᴏʟᴅsᴘᴇᴇᴅ I don't know what you mean by "assign it back". Assign the return value to a variable? They're clearly seeing the return values or else they wouldn't have included them in the answer. – user234461 Apr 30 '18 at 15:37
1

Easier if I show you what could happen: https://stackoverflow.com/q/26943256/4909087 – cs95 Apr 30 '18 at 15:40
@cᴏʟᴅsᴘᴇᴇᴅ Ah, okay, so you think he's doing something like this: `foo.replace('\\u2019', "'") ; print foo;`. Wouldn't have thought of that; thank you! – user234461 Apr 30 '18 at 15:46

score -2 · Answer 2 · answered Apr 30 '18 at 15:29

-2

yeah. If your string is foo = r'What\u2019s with the adverts?' will ok with foo.replace('\\u2019', "'"). It is a raw string and begins with r''. And with u'' is Unicode. Hope to help you.

answered Apr 30 '18 at 15:29

AJackTi

63
2
7

But his string clearly isn't raw, with a literal "\u" in it. If it were, why would his return values be unicode strings with actual unicode single quotes embedded in them? – user234461 Apr 30 '18 at 15:32
Using regex is a solution. It is a clear thing to understand. :) – AJackTi May 01 '18 at 02:28
If using foo.replace(u'\u2019',"'") is right solution. :) – AJackTi May 01 '18 at 13:27
"If the answer were different it would be correct"... hmmm... – user234461 May 01 '18 at 17:21

python .replace not working properly

2 Answers2