Parse raw html into something meaningful

Question

I am getting a webpage from the web like this

import requests

html = requests.get("http://www.google.com/")

this returns a whole lot of junk in the html variable what I want from this is that I want only the data that is displayed in the web browser and no other useless data like html tag head , link , meta , script and other useless tags and its content . I tried doing this with the HTMLParser module but it just strips the tags out of it . Any Idea how should i achieve this?

The `html` `head`, `link`, `meta`, `script`, etc are part of the html that is displayed in the web browser though. — AndrewL64, Feb 07 '17 at 20:36
As far as I know they are not displayed in the web browser they are there for animation or background purposes, by displayed i mean only the output that the user see as static . everything is inside html so leave html but link , meta, script etc. are a junk for me. Correct me if i am wrong... — Zaid Khan, Feb 07 '17 at 20:39
The static elements displayed in your browser depends on the above tags Zaid (styling of the elements via the `link` tag for css, scripts via the `script` tag for javascript and such, etc). — AndrewL64, Feb 07 '17 at 20:44
Yes, I totally agree with You but i need to scrap just the text I don't want any styling or javascript code — Zaid Khan, Feb 07 '17 at 20:48
Check this: http://stackoverflow.com/questions/11709079/parsing-html-using-python Just target the `body` instead of the `container` class in the answer. — AndrewL64, Feb 07 '17 at 20:56

Parse raw html into something meaningful

0 Answers0