4

I wrote a piece of code to scrape a web which actually works using it with one url but as soon as I put more than 2 ursl in the .txt tells me 'Segmentation Fault'. I have no idea where the problem is. Any help will be appreciated.

import sys
import time
import gc
from bs4 import BeautifulSoup
from PyQt4.QtGui import *
from PyQt4.QtCore import *
from PyQt4.QtWebKit import * 


class Render(QWebPage):
    def __init__(self, url):
        self.app = QApplication(sys.argv)
        QWebPage.__init__(self)
        self.loadFinished.connect(self._loadFinished)
        self.mainFrame().load(QUrl(url))
        self.app.exec_()

    def _loadFinished(self, result):
        self.frame = self.mainFrame()
        #self.deleteLater()
        self.app.quit()

with open('/blah/blah/blah/blah/blah.txt') as f:
    urls = f.read().splitlines()

    for i in urls:
        r = Render(i)
        soup = BeautifulSoup(unicode(r.frame.toHtml()))
        summary = soup.find('div',{'style' : 'padding-top:10px;'})
        tables = summary.find('tbody')
        count = 0
        print 
        for row in tables.findAll('tr'):
            for cell in row.findAll('td'):
                data = cell.getText()
                if (count < 15):
                    data = data + ';'
                    print data, 
                count += 1
                if (count==16):
                    print data
                    count = 0       

Well, thats the code. I get 2 iterations of the with loop before it tells me Segmentation fault... :( In other words, I get to scrape 2 url´s out of 6 that the txt has.

Thanks in advance for the help

Mangu Singh Rajpurohit
  • 10,806
  • 4
  • 68
  • 97
Enaggi
  • 43
  • 2
  • 1
    What's the exact error. – Peter Wood Nov 13 '15 at 12:19
  • In the .txt file there are 6 url´s I want to take to scrape. When running the code it takes the first one, gives me the output. Takes the second one , gives me the output and right after says ´Segmentation Fault'. Dont understad why... – Enaggi Nov 13 '15 at 12:34
  • How exactly is the Segmentation Fault presented? – Peter Wood Nov 13 '15 at 18:12
  • If you only do the third URL do you get the fault? – Peter Wood Nov 13 '15 at 18:13
  • Hey, thanks for the help ;) Yeah it does work with only the 3rd URL. I dont really understand "How exactly is the Segmentation Fault presented?" question but I will try to explain the situation. From the terminal I execute the code and I can see how the output is appearing in the terminal. I can see the first output, the second output and right after says "Segment Fault" and stops the execution. – Enaggi Nov 13 '15 at 18:36
  • What are the urls? What version are you using of Python, etc. What operating system are you on? – Peter Wood Nov 15 '15 at 21:31
  • 1
    Url's are as this one : http://simplesoccerstats.com/stats/teamstats.php?lge=6&type=corners&season=2 Im on python 2.7, in Kali Linux. – Enaggi Nov 17 '15 at 16:00
  • Hey.. I have the same exact issue.. Did you manage to find a solution? – naveed Dec 21 '15 at 09:12

1 Answers1

2

I managed to reproduce the problem. The following code causes Python to fail (on windows). No need for all the file reading and BeautifulSoup code:

for _ in range(3):
    r = Render('google.com')

If I make sure the first Render object is deleted before creating the second, then there is no error:

for _ in range(3):
    r = Render('google.com')
    del r

I found this related question which says you can't have multiple PyQt applications in one process. I'm not familiar with PyQt so don't know how you'd solve this. It's probably simple, but you'll have to search a little.

Also, this question has code almost identical to yours, and has a very good answer showing how to create a single QApplication and fetch multiple urls. Your question should probably be closed as a duplicate.

Community
  • 1
  • 1
Peter Wood
  • 23,859
  • 5
  • 60
  • 99
  • Thanks a lot for you time and effort, but still giving me Segmentation Fault after doing as you say. I read that article before but it didn´t help me that is why I open another one. Don´t know what to do anymore.. – Enaggi Nov 17 '15 at 16:03
  • Didn't see that u edited it before. Will check it out after work! thanks! – Enaggi Nov 18 '15 at 09:08
  • @Enaggi Sorry, I just tried the code and it doesn't work. Try the code at the link. I don't have time to fix it, so will remove it to avoid confusing others. – Peter Wood Nov 18 '15 at 09:26
  • Yeah, I did realize that. No worries, and thanks for the help. Will check the link again and see how can I modify it to use in mine. – Enaggi Nov 19 '15 at 10:24