i want to read all the text information from an html page that i have stored locally. i managed to get it to read all the page's information but it is also reading the html tags and javascript code.
i am trying to get the information from a downloading html file and not a url from a website. i want a method to only get the text from the html page i have that works with my code below
how can i make it such that it only writes the text that is in the html page into the text file?
here is my code:
with open("ct.html","r",encoding='utf') as f:
data = f.read()
with open("test.txt", "w",encoding='utf-8-sig') as f:
for line in data:
f.write(line)