I'm trying to render websites in PyQt that are written in java. The first site is rendered without problems and scraped for the information I need, but when I want to use the same class to render another site and retrieve the new data it tells me the frame that's defined in the Render class is not defined (which was defined for the first website, which worked perfectly fine in retrieving the data that I needed). So, why is this happening? Am I missing something fundamental in Python? My understanding is that when the first site has been rendered, then the object will be garbage collected and the second one can be rendered. Below is the referred code:
import sys
from PyQt4.QtGui import *
from PyQt4.QtCore import *
from PyQt4.QtWebKit import *
from lxml import html
class Render(QWebPage):
def __init__(self, url):
self.app = QApplication(sys.argv)
QWebPage.__init__(self)
self.loadFinished.connect(self._loadFinished)
self.mainFrame().load(QUrl(url))
self.app.exec_()
def _loadFinished(self, result):
self.frame = self.mainFrame()
self.app.quit()
urls = ['http://pycoders.com/archive/', 'http://us4.campaign-archive2.com/home/?u=9735795484d2e4c204da82a29&id=64134e0a27']
for url in urls:
r = Render(url)
result = r.frame.toHtml()
#This step is important.Converting QString to Ascii for lxml to process
#QString should be converted to string before processed by lxml
formatted_result = str(result)
#Next build lxml tree from formatted_result
tree = html.fromstring(formatted_result)
#Now using correct Xpath we are fetching URL of archives
archive_links = tree.xpath('//div[@class="campaign"]/a/@href')[1:5]
print (archive_links)
The error message I'm getting:
File "javaweb2.py", line 24, in <module>
result = r.frame.toHtml()
AttributeError: 'Render' object has no attribute 'frame'
Any help would be much appreciated!