0

I am trying to use Pyside to render a webpage's JavaScript generated HTML, then use that html for webscraping. I started off using this quick example, but the results are very inconsistent.

The problem is that some pages work perfectly fine, but others hang infinitely. And I'm not talking about giving up after a few seconds, I've let my script run for hours at various times and no progress is being made.

My current code is as follows:

import sys
from PySide.QtCore import *
from PySide.QtGui import *
from PySide.QtWebKit import *

class Render(QWebPage):
    def __init__(self, url):
        self.app = QApplication(sys.argv)
        QWebPage.__init__(self)
        self.loadFinished[bool].connect(self.end)
        self.mainFrame().load(url)

        self.app.exec_()

    def end(self, result):
        print 'end'
        self.finalFrame = self.mainFrame()
        self.app.quit()

r = Render('http://pyside.github.io/docs/pyside/PySide/QtWebKit/index.html')
print r.finalFrame.toHtml().encode('ascii', 'ignore')
print 'done'

This page works, as do the pages given in this answer, but most others ('https://www.google.ca/', 'https://webscraping.com') do not.

How do I get these pages to load?

Community
  • 1
  • 1
GreySage
  • 1,153
  • 19
  • 39
  • The problem must at your end, because I have no problem loading any of those web-pages. So this is really just a duplicate of your other question on this subject, unless you have some significant new information to add. Have tried loading those urls using a different method, such as [urllib2](https://docs.python.org/2/library/urllib2.html#module-urllib2)? – ekhumoro Jan 23 '17 at 23:32
  • Yes I can load them using urllib2 and it works properly. The urls I mentioned either never load or take longer than 6 hours (so far). From what I am reading it's possible that it might be an SSL error, but none of the suggested fixes I've found can be implemented for various reasons (I'm not using sockets, QSslConfiguration module cannot be imported, etc.) – GreySage Jan 23 '17 at 23:48
  • Can you please state which specifc versions of PySide and Qt4 you are using, and on what platform? Also, please ensure that you test the code in a standard console, rather than in an IDE or debugger. – ekhumoro Jan 24 '17 at 20:34

1 Answers1

0

The problem seems to be SSL related. I'm still not sure what exactly the problem was, but it was fixed by:

  1. uninstalling the Anaconda version (1.2.1) of PySide and installing it with pip (1.2.4). It seems like the Anaconda build is fundamentally broken, in that various attributes of classes don't exist when they should and there are unresolvable circular dependencies.

  2. downloading openSSL (lite) and placing the 2 dlls (ssleay.dll and libeay.dll) in both the directory where the program is run and the environment/Library/bin. Either one on it's own did not work. Credit for this part goes to this question.

GreySage
  • 1,153
  • 19
  • 39