16

I need to replace in a string the character "»" with a whitespace, but I still get an error. This is the code I use:

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup

# other code

soup = BeautifulSoup(data, 'lxml')
mystring = soup.find('a').text.replace(' »','')

UnicodeEncodeError: 'ascii' codec can't encode character u'\xbb' in position 13: ordinal not in range(128)

But If I test it with this other script:

# -*- coding: utf-8 -*-
a = "hi »"
b = a.replace('»','') 

It works. Why this?

Moinuddin Quadri
  • 46,825
  • 13
  • 96
  • 126
Hyperion
  • 2,515
  • 11
  • 37
  • 59
  • 1
    googling the error you get as it is, i get this back: http://stackoverflow.com/questions/5141559/unicodeencodeerror-ascii-codec-cant-encode-character-u-xef-in-position-0 There should be something there you can use – Ma0 Nov 29 '16 at 17:37

2 Answers2

22

In order to replace the content of string using str.replace() method; you need to firstly decode the string, then replace the text and encode it back to the original text:

>>> a = "hi »"
>>> a.decode('utf-8').replace("»".decode('utf-8'), "").encode('utf-8')
'hi '

You may also use the following regex to remove all the non-ascii characters from the string:

>>> import re
>>> re.sub(r'[^\x00-\x7f]',r'', 'hi »')
'hi '
Moinuddin Quadri
  • 46,825
  • 13
  • 96
  • 126
  • 2
    The regex version is the fastest. Instead of using `[^\x00-\x7f]` I used `[^\x20-\x7E]` to also remove the ASCII control characters from 0 up to 31 and 127. – Evandro Coan Jun 18 '19 at 17:38
9

@Moinuddin Quadri's answer fits your use-case better, but in general, an easy way to remove non-ASCII characters from a given string is by doing the following:

# the characters '¡' and '¢' are non-ASCII
string = "hello, my name is ¢arl... ¡Hola!"

all_ascii = ''.join(char for char in string if ord(char) < 128)

This results in:

>>> print(all_ascii)
"hello, my name is arl... Hola!"

You could also do this:

''.join(filter(lambda c: ord(c) < 128, string))

But that's about 30% slower than the char for char ... approach.

blacksite
  • 12,086
  • 10
  • 64
  • 109