I am trying to scrape several webpages using Python PyQT4 + Beautiful Soup.
Due to the nature of my overal program, I use a main script "program.py" calling functions from other scripts, doing different analyses with beautiful Soup.
Thus, the simplified architecture of my main program.py is as follows :
program.py :
import script1
import script2
script1.function1(urlA)
script2.function2(urlB)
With script1.py and script2.py as follows :
script1.py :
import requests
import re
from bs4 import BeautifulSoup
from PyQt4.QtGui import *
from PyQt4.QtCore import *
from PyQt4.QtWebKit import *
class Render(QWebPage):
def __init__(self, url):
self.app = QApplication(sys.argv)
QWebPage.__init__(self)
self.loadFinished.connect(self._loadFinished)
self.mainFrame().load(QUrl(url))
self.app.exec_()
def _loadFinished(self, result):
self.frame = self.mainFrame()
self.app.quit()
def function1(url):
r = Render(url)
soup = BeautifulSoup(unicode(r.frame.toHtml()))
#Do many things with soup.
#Nothing related to PyQT4 further in this script
And my script 2 has exactly the same structure, but does other things on another url.
script2.py :
import requests
import re
from bs4 import BeautifulSoup
from PyQt4.QtGui import *
from PyQt4.QtCore import *
from PyQt4.QtWebKit import *
class Render(QWebPage):
def __init__(self, url):
self.app = QApplication(sys.argv)
QWebPage.__init__(self)
self.loadFinished.connect(self._loadFinished)
self.mainFrame().load(QUrl(url))
self.app.exec_()
def _loadFinished(self, result):
self.frame = self.mainFrame()
self.app.quit()
def function2(url):
r = Render(url)
soup = BeautifulSoup(unicode(r.frame.toHtml()))
#Do many other things with soup
#Nothing related to PyQT4 further in this script
Everything works fine with script1.py. My function1 and analyses are run successfully.
But script2.py bugs, and I have the following error :
QObject::connect: Cannot connect (null)::configurationAdded(QNetworkConfiguration) to QNetworkConfigurationManager::configurationAdded(QNetworkConfiguration)
QObject::connect: Cannot connect (null)::configurationRemoved(QNetworkConfiguration) to QNetworkConfigurationManager::configurationRemoved(QNetworkConfiguration)
QObject::connect: Cannot connect (null)::configurationChanged(QNetworkConfiguration) to QNetworkConfigurationManager::configurationChanged(QNetworkConfiguration)
QObject::connect: Cannot connect (null)::onlineStateChanged(bool) to QNetworkConfigurationManager::onlineStateChanged(bool)
QObject::connect: Cannot connect (null)::configurationUpdateComplete() to QNetworkConfigurationManager::updateCompleted()
I spent time searching for this problem, and I found that PyQT4 could not load several pages in the same instance.
The problem is that I need PyQT4 to render Javascripts before loading the page content into Beautiful Soup.
So I think I need to put some kind of "self.app.quit()" at the end of my function1 in script1, so that function2 in script2 can render a page with PyQT4 too. But I was not able to make it work.