1

Why does replace not work with this string?

['Python \xc3\xa9 uma linguagem de programa\xc3\xa7\xc3\xa3o de alto n\xc3\xadvel']

The above string comes from a nltk, follow the code:

# - * - coding: utf-8 - * -

import nltk

text = 'Python is a high-level programming language'

val = str(nltk.tokenize.sent_tokenize (text))
val = val.replace('\xc3\xa9', 'é')
print val
niton
  • 8,771
  • 21
  • 32
  • 52
  • Don't you want '\xc3\xa9' instead of '\ xc3 \ xa9' ? Spaces matter. – Peteris Jun 29 '17 at 00:52
  • No, sorry the correct character is 'é': Val = val.replace ('\ xc3 \ xa9', 'é') – Seomis Adof Jun 29 '17 at 00:59
  • Don't you want `text = 'Python \xc3\xa9 uma linguagem de programa\xc3\xa7\xc3\xa3o de alto n\xc3\xadvel'` instead of `text = 'Python is a high-level programming language'`? – hxysayhi Jun 29 '17 at 01:48
  • 1
    Best solution: Switch to Python 3, then sort it out. You're wasting your time trying to understand how Python 2 handles character encodings. – alexis Jun 29 '17 at 09:04
  • I suggest you edit your code sample so that the string `text` _actually_ contains the escape sequence. Then _test your code_ and confirm that it still has the problem you reported. – alexis Jun 29 '17 at 09:05

1 Answers1

0

Try this:

val = val.replace(r'\xc3\xa9', 'é')

or

val = val.replace('\\xc3\\xa9', 'é')

Because \ is a escape character.

What exactly do “u” and “r” string flags do in Python, and what are raw string literals? This may be help.

hxysayhi
  • 1,888
  • 18
  • 25