I am trying to load a web page using PySide's QtWebKit module. According to the documentation (Elements of QWebView; QWebFrame::toHtml()), the following script should print the HTML of the Google Search Page:
from PySide import QtCore
from PySide import QtGui
from PySide import QtWebKit
# Needed if we want to display the webpage in a widget.
app = QtGui.QApplication([])
view = QtWebKit.QWebView(None)
view.setUrl(QtCore.QUrl("http://www.google.com/"))
frame = view.page().mainFrame()
print(frame.toHtml())
But alas it does not. All that is printed is the method's equivalent of a null response:
<html><head></head><body></body></html>
So I took a closer look at the setUrl documentation:
The view remains the same until enough data has arrived to display the new url.
This made me think that maybe I was calling the toHtml() method too soon, before a response has been received from the server. So I wrote a class that overrides the setUrl method, blocking until the loadFinished signal is triggered:
import time
class View(QtWebKit.QWebView):
def __init__(self, *args, **kwargs):
super(View, self).__init__(*args, **kwargs)
self.completed = True
self.loadFinished.connect(self.setCompleted)
def setCompleted(self):
self.completed = True
def setUrl(self, url):
self.completed = False
super(View, self).setUrl(url)
while not self.completed:
time.sleep(0.2)
view = View(None)
view.setUrl(QtCore.QUrl("http://www.google.com/"))
frame = view.page().mainFrame()
print(frame.toHtml())
That made no difference at all. What am I missing here?
EDIT: Merely getting the HTML of a page is not my end game here. This is a simplified example of code that was not working the way I expected it to. Credit to Oleh for suggesting replacing time.sleep() with app.processEvents()