BeautifulSoup UnicodeEncodeError: ascii codec

Asked Apr 30 '17 at 18:08

Active Apr 30 '17 at 21:01

Viewed 1,132 times

I'm trying to do some parsing with BeautifulSoup:

from bs4 import BeautifulSoup
import requests
import lxml

r = requests.get('https://pythonprogramming.net/parsememcparseface/')

page_text = r.text.encode('utf-8').decode('ascii', 'ignore')

soup = BeautifulSoup(page_text, 'lxml')

print(soup.find_all('p'))

I can't use find_all('p') because of UnicodeEncodeError. Typing just soup.p works good. I used the variable page_text to encode html file but it is not enough. How can I overcome this error and access all paragraphs from the site?

asked Apr 30 '17 at 18:08

Maxxer

I run your code then don't give any error. – Saeed Ghareh Daghi Apr 30 '17 at 18:20
I'm still getting this error `print(soup.find_all('p')) UnicodeEncodeError: 'ascii' codec can't encode character '\xa0' in position 760: ordinal not in range(128)` – Maxxer Apr 30 '17 at 18:21
me too, no error, even drop variable `page_text`, and change `page_text` to `r.content` in `BeautifulSoup`, it works. – Tiny.D Apr 30 '17 at 18:22
1

Your `print()` is failing, not your BeautifulSoup code. You are printing Unicode text to a console that doesn't support the characterset. – Martijn Pieters Apr 30 '17 at 18:26
@MartijnPieters I think you got it, I'm using sublime text. Know any solutions? – Maxxer Apr 30 '17 at 18:26
@Maxxer: See [printing UTF-8 in Python 3 using Sublime Text 3](//stackoverflow.com/q/39576308) – Martijn Pieters Apr 30 '17 at 21:01

BeautifulSoup UnicodeEncodeError: ascii codec

0 Answers0