Replace nltk string

Question

Why does replace not work with this string?

['Python \xc3\xa9 uma linguagem de programa\xc3\xa7\xc3\xa3o de alto n\xc3\xadvel']

The above string comes from a nltk, follow the code:

# - * - coding: utf-8 - * -

import nltk

text = 'Python is a high-level programming language'

val = str(nltk.tokenize.sent_tokenize (text))
val = val.replace('\xc3\xa9', 'é')
print val

Don't you want '\xc3\xa9' instead of '\ xc3 \ xa9' ? Spaces matter. — Peteris, Jun 29 '17 at 00:52
No, sorry the correct character is 'é': Val = val.replace ('\ xc3 \ xa9', 'é') — Seomis Adof, Jun 29 '17 at 00:59
Don't you want `text = 'Python \xc3\xa9 uma linguagem de programa\xc3\xa7\xc3\xa3o de alto n\xc3\xadvel'` instead of `text = 'Python is a high-level programming language'`? — hxysayhi, Jun 29 '17 at 01:48
Best solution: Switch to Python 3, then sort it out. You're wasting your time trying to understand how Python 2 handles character encodings. — alexis, Jun 29 '17 at 09:04
I suggest you edit your code sample so that the string `text` _actually_ contains the escape sequence. Then _test your code_ and confirm that it still has the problem you reported. — alexis, Jun 29 '17 at 09:05

hxysayhi · Answer 1 · 2017-06-29T02:16:04.540

0

Try this:

val = val.replace(r'\xc3\xa9', 'é')

or

val = val.replace('\\xc3\\xa9', 'é')

Because \ is a escape character.

What exactly do “u” and “r” string flags do in Python, and what are raw string literals? This may be help.

edited Jun 29 '17 at 02:16

answered Jun 29 '17 at 02:07

hxysayhi

1,888
18
25

Perfect! this solve may problem: val = val.replace('\\xc3\\xa9', 'é') tks vm! – Seomis Adof Jun 29 '17 at 22:19
@SeomisAdof, Please accept the answer if it works, tks vm. – hxysayhi Jun 30 '17 at 02:04
I dont have reputation fpr this yeat – Seomis Adof Jul 01 '17 at 19:44

Replace nltk string

1 Answers1