How to input html in python

Question

I want to input html document into my python.

I get this error:

UnicodeDecodeError: 'cp950' codec can't decode byte 0xbb in position
362: illegal multibyte sequence

when using this code:

from bs4 import BeautifulSoup

soup = BeautifulSoup(open(xxx.html))  
print(soup)

What am I doing wrong?

Possible duplicate of [UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c](https://stackoverflow.com/questions/12468179/unicodedecodeerror-utf8-codec-cant-decode-byte-0x9c) — Max, Sep 23 '17 at 05:26

score 0 · Answer 1 · answered Sep 23 '17 at 05:58

you are facing a encode/decode problem.
try this:

soup = BeautifulSoup(open('xxx.html', encoding='your xxx.html file encoding'))

you can find 'your xxx.html encoding' by searching 'charset' in the file.
then, you will get something like charset=utf-8 or other charset=xxx
behind '=', 'utf-8' or 'xxx', is your xxx.html encoding

How to input html in python

1 Answers1