I managed to get the page source DOM of an external website, but it came with \r\n and lots of whitespace.
import urllib.request
request = urllib.request.Request('http://example.com')
response = urllib.request.urlopen(request)
page = response.read()
page = page.strip('\r\n')
print (page)
I tried stripping them, but no luck. How can I get just the HTML?
And secondly, what is the logic for manipulating the returned DOM with javascript/jquery? I was hoping to do something like:
alert(document.getElementsByTagName('h1')[0].innerHTML);
Which should alert "Example Domain" with the generated DOM.