0

I am trying to read the contents of an HTML file with BeautifulSoup, but I'm receiving an UnicodeDecodeError.

I have the file in the project folder its an HTML file

I also tried changing the parser to html.parser instead of the lxml but it doesn't work.

however, if I use the requests library to request the URL, it works, but not if I read the HTML file locally.

answer: I needed to add a Unicode and it was should have something like that: with open('lap.html', encoding="utf8") as html_file:

2 Answers2

0

You are passing a file to 'BeautifulSoup' instead you have to pass the content of the file.

try : content = html_file.read() source = BeautifulSoup(content, 'lxml')

EL hassane
  • 134
  • 1
  • 9
  • add the encoding to file open [charmap' codec can't decode byte python](https://stackoverflow.com/a/9233174/6602608) – EL hassane Dec 02 '20 at 21:04
  • Then edit your answer to the actual or let him mark mine, since i mentioned the coding standart already – Timeler Dec 02 '20 at 21:17
0

First of all, fix the soruce to source, then make a gap between the equal sign and the text and then find out, what might not be encodable by the coding standart you use, because that error refers to a sign which cant be decoded/encoded

Timeler
  • 377
  • 1
  • 11