1

I am trying to scrape several webpages using Python PyQT4 + Beautiful Soup.

Due to the nature of my overal program, I use a main script "program.py" calling functions from other scripts, doing different analyses with beautiful Soup.

Thus, the simplified architecture of my main program.py is as follows :

program.py :

import script1
import script2

script1.function1(urlA)
script2.function2(urlB)

With script1.py and script2.py as follows :

script1.py :

import requests
import re
from bs4 import BeautifulSoup
from PyQt4.QtGui import *
from PyQt4.QtCore import *
from PyQt4.QtWebKit import * 

class Render(QWebPage):
    def __init__(self, url):
        self.app = QApplication(sys.argv)
        QWebPage.__init__(self)
        self.loadFinished.connect(self._loadFinished)
        self.mainFrame().load(QUrl(url))
        self.app.exec_()
    def _loadFinished(self, result):
        self.frame = self.mainFrame()
        self.app.quit()   


def function1(url):
    r = Render(url)
    soup = BeautifulSoup(unicode(r.frame.toHtml()))

    #Do many things with soup.
    #Nothing related to PyQT4 further in this script

And my script 2 has exactly the same structure, but does other things on another url.

script2.py :

import requests
import re
from bs4 import BeautifulSoup
from PyQt4.QtGui import *
from PyQt4.QtCore import *
from PyQt4.QtWebKit import * 

class Render(QWebPage):
    def __init__(self, url):
        self.app = QApplication(sys.argv)
        QWebPage.__init__(self)
        self.loadFinished.connect(self._loadFinished)
        self.mainFrame().load(QUrl(url))
        self.app.exec_()
    def _loadFinished(self, result):
        self.frame = self.mainFrame()
        self.app.quit()   


def function2(url):
    r = Render(url)
    soup = BeautifulSoup(unicode(r.frame.toHtml()))

    #Do many other things with soup
    #Nothing related to PyQT4 further in this script

Everything works fine with script1.py. My function1 and analyses are run successfully.

But script2.py bugs, and I have the following error :

QObject::connect: Cannot connect (null)::configurationAdded(QNetworkConfiguration) to QNetworkConfigurationManager::configurationAdded(QNetworkConfiguration)
QObject::connect: Cannot connect (null)::configurationRemoved(QNetworkConfiguration) to QNetworkConfigurationManager::configurationRemoved(QNetworkConfiguration)
QObject::connect: Cannot connect (null)::configurationChanged(QNetworkConfiguration) to QNetworkConfigurationManager::configurationChanged(QNetworkConfiguration)
QObject::connect: Cannot connect (null)::onlineStateChanged(bool) to QNetworkConfigurationManager::onlineStateChanged(bool)
QObject::connect: Cannot connect (null)::configurationUpdateComplete() to QNetworkConfigurationManager::updateCompleted()

I spent time searching for this problem, and I found that PyQT4 could not load several pages in the same instance.

The problem is that I need PyQT4 to render Javascripts before loading the page content into Beautiful Soup.

So I think I need to put some kind of "self.app.quit()" at the end of my function1 in script1, so that function2 in script2 can render a page with PyQT4 too. But I was not able to make it work.

Vincent
  • 1,534
  • 3
  • 20
  • 42
  • It may be a duplicate question as http://stackoverflow.com/questions/21909907/pyqt-class-not-working-for-the-second-usage but I was unable to make it work for me...? – Vincent Nov 15 '15 at 01:11
  • I've expanded the example code in my answer to that question to make it a little more flexible. – ekhumoro Nov 16 '15 at 18:37

1 Answers1

0

How about this

r = Render(url)
soup = BeautifulSoup(unicode(r.frame.toHtml()))

r.app.quit()
furas
  • 134,197
  • 12
  • 106
  • 148
  • hi furas, thanks but it does not work I still have the same error when adding "r.app.quit()" at the far end of my scripts... – Vincent Nov 15 '15 at 01:10