I am trying to use Pyside to render a webpage's JavaScript generated HTML, then use that html for webscraping. I started off using this quick example, but the results are very inconsistent.
The problem is that some pages work perfectly fine, but others hang infinitely. And I'm not talking about giving up after a few seconds, I've let my script run for hours at various times and no progress is being made.
My current code is as follows:
import sys
from PySide.QtCore import *
from PySide.QtGui import *
from PySide.QtWebKit import *
class Render(QWebPage):
def __init__(self, url):
self.app = QApplication(sys.argv)
QWebPage.__init__(self)
self.loadFinished[bool].connect(self.end)
self.mainFrame().load(url)
self.app.exec_()
def end(self, result):
print 'end'
self.finalFrame = self.mainFrame()
self.app.quit()
r = Render('http://pyside.github.io/docs/pyside/PySide/QtWebKit/index.html')
print r.finalFrame.toHtml().encode('ascii', 'ignore')
print 'done'
This page works, as do the pages given in this answer, but most others ('https://www.google.ca/', 'https://webscraping.com') do not.
How do I get these pages to load?