0

I'm trying to get a title of html document in python, but getting weird symbols. I guess that's because of encoding, but the html doc in utf-8 encoding. Is there any way I can get normal letters?

Here is code and what am I getting:

from bs4 import BeautifulSoup

 with open("index.html") as file:
     src = file.read()


soup = BeautifulSoup(src, "lxml")

title = soup.title.text

print(title)

Главная страница

zetparson
  • 33
  • 6

1 Answers1

-1

You need to specify an encoding type when opening the file:

 with open("index.html", encoding='utf-8') as file:
     src = file.read()
Xiddoc
  • 3,369
  • 3
  • 11
  • 37