I am getting a webpage from the web like this
import requests
html = requests.get("http://www.google.com/")
this returns a whole lot of junk in the html variable what I want from this is that I want only the data that is displayed in the web browser and no other useless data like html
tag head
, link
, meta
, script
and other useless tags and its content . I tried doing this with the HTMLParser
module but it just strips the tags out of it . Any Idea how should i achieve this?