0

I have two scripts that run as needed while separated. One being a code for PyQt5 GUI application, and second one is a code very similar to this one, with a slight modification to be able to convert contents in case there are any smiley faces that cause problems.

Basically when I press some button in my app window, I expect the second code to be ran.

No matter how hard I tried to fit in the second code, it will always crash my app (or Python). The furthest I was able to get to, is when the second code works after I close my main window - then it runs, and gives me the result I want.

I suspect it has to do with __init__from second code not being happy that there's already another __init__ from main window running? As you can tell I'm very confused about the object-oriented part to Python, though no matter how hard I was trying to self-educate for the past few days on the subject, I was unable to fit those two codes together.

My app:

#'all the necessary imports'

class MainWindow(QWidget):
    def __init__(self, parent=None):
        super(MainWindow, self).__init__(parent)
        self.text = QWebEngineView(self)
        self.proc_btn = QPushButton('Proceed')
        self.userUrl = QLineEdit(self)
        self.labOne = QLabel(self)
        self.labTwo = QLabel(self)
        self.defUrl = 'default'
        self.init_ui()



    def init_ui(self):
        v_layout = QVBoxLayout()
        h_layout = QHBoxLayout()

        h_layout.addWidget(self.proc_btn)
        h_layout.addWidget(self.userUrl)

        v_layout.addWidget(self.text)
        v_layout.addWidget(self.labOne)
        v_layout.addWidget(self.labTwo)

        v_layout.addLayout(h_layout)

        self.labOne.setText('URL: ')
        self.labTwo.setText('<ENTER LINK PLEASE>')
        self.userUrl.returnPressed.connect(self.linkPut)
        self.proc_btn.clicked.connect(self.doStuff)
        self.setLayout(v_layout)
        self.setWindowTitle('Scrapper')
        self.show()



    def doStuff(self):
        print('Doing stuff (expecting 2nd script to be ran)')

    def linkPut(self):
        newText = (self.userUrl.text())
        print('newText: ' + newText)
        self.labTwo.setText(newText)
        self.defUrl = newText


app = QApplication(sys.argv)
a_window = MainWindow()
sys.exit(app.exec_())

Script I need to implement:

#'all necessary imports'
class Page(QWebEnginePage):
    def __init__(self, url):
        self.app = QApplication(sys.argv)
        QWebEnginePage.__init__(self)
        self.html = ''
        self.loadFinished.connect(self._on_load_finished)
        self.load(QUrl(url))
        self.app.exec_()
        print('__init__ WORKS')

    def _on_load_finished(self):
        self.html = self.toHtml(self.Callable)
        print('Load finished')

    def Callable(self, html_str):
        self.html = html_str
        self.app.quit()


_nonbmp = re.compile(r'[\U00010000-\U0010FFFF]')


def _surrogatepair(match):
    char = match.group()
    assert ord(char) > 0xffff
    encoded = char.encode('utf-16-le')
    return (
        chr(int.from_bytes(encoded[:2], 'little')) + 
        chr(int.from_bytes(encoded[2:], 'little')))

def with_surrogates(text):
    return _nonbmp.sub(_surrogatepair, text)


def main():
    page = Page('https://somenicepage.com/')
    soup = bs.BeautifulSoup(page.html, 'html.parser'))
    longStrCoded = str(soup.find("img", {"class":"pictures"}))
    longStr = with_surrogates(longStrCoded)
    print('long str: ' + longStr)
    extract = longStr.split('src="')[1].split('"')[0]
    print(extract)

if __name__ == '__main__': main()
NumeroSMG
  • 41
  • 8
  • `import other_py_file` at the start of the file, and then `other_py_file.main()` where you want to trigger it – Maarten Fabré Aug 03 '17 at 10:18
  • This I already tried, it crashed if I use it within my app, but if I close my app first, then run the `other_py_file.main()` it works. – NumeroSMG Aug 03 '17 at 10:31
  • can you be more specific than 'it crashed' – Maarten Fabré Aug 03 '17 at 12:22
  • ''__Python has stopped working__ - A problem caused by the program to stop working correctly. Windows will close the program and notify you if a solution is available.'' Also happens if I sometimes make typo in syntax, etc. Occurs when working with PyQt5 for some reason. – NumeroSMG Aug 03 '17 at 13:33
  • Then I would try to simplify this as much as possible and add complexity step by step to see which step causes the problem – Maarten Fabré Aug 03 '17 at 13:37
  • Don't create a `QApplication` inside your `Page` class. There should only be one instance of `QApplication` and you have correctly created one in your first code sample. Out of interest, are you following some tutorial for the `QWebEngine` stuff? I see many people on this site doing the exact same thing and creating `QApplication` in the `__init__` – user3419537 Aug 03 '17 at 14:39
  • For my first code I followed this youtube tutorial [link](https://www.youtube.com/watch?v=aiCr9pkE5AI&index=14&list=PLZocUikpczs-Yud2lyFpSNQOvxuPUVBDp) . For the second code I put a link in my 1st paragraph of question where I took it from. – NumeroSMG Aug 03 '17 at 14:54
  • How would I go without creating `QAppliaction` inside the `Page`? If I just comment it out and try to run the 2nd script standalone it does nothing and console outputs `Process finished with exit code -1073741819 (0xC0000005)`. If I import the script inside my first code, and try to run it in the function `doStuff()` it just straight crashes as described in my previous comments. – NumeroSMG Aug 03 '17 at 15:04

1 Answers1

0

The problem is that in combining the two files you are attempting to create multiple instances of QApplication, which is not allowed. In addition, the QApplication class is intended to encapsulate your entire application logic and is responsible for event handling etc. You generally should not be creating inside some other class like you are doing inside of Page.__init__.

Typically, you would create and start QApplication close to your program entry point. You are doing this correctly in the first block of code.

if __name__ == '__main__':
    app = QtWidgets.QApplication(sys.argv)  # Instantiate application
    window = MainWindow()  # The rest of your program logic should flow from here
    sys.exit(app.exec_())  # Start application event loop

The asynchronous nature of QtWebEngine complicates things a little bit, as your program will not wait for the page to load before moving on to the next instruction. I believe people are starting the QApplication inside the page class as a quick and dirty (or naive) way to force the program to wait for the page to finish loading. This might be fine in a python script where Qt is only being used for QtWebEngine's ability to evaluate a dynamic webpage, but is poor practice for a real Qt application. The correct way to deal with this problem is via callbacks or Qt's signals and slots system.

Based off your original class, here is a version that uses a callback to continue processing the html once it is fully loaded.

class Page(QtWebEngineWidgets.QWebEnginePage):

    def __init__(self, url):
        super(Page, self).__init__()
        self.url = QtCore.QUrl(url)
        self.callback = None
        self.html = ''
        self.loadFinished.connect(self.on_load_finished)

    def load_html(self, callback=None):
        self.callback = callback
        self.load(self.url)

    def on_load_finished(self):
        self.toHtml(self.on_html_ready)

    def on_html_ready(self, html):
        self.html = html
        if self.callback:
            self.callback(html)

Next, define the callback that will handle the loaded page. Here you can place the code from your main() function.

def do_stuff(html):
    soup = bs.BeautifulSoup(html, 'html.parser'))
    longStrCoded = str(soup.find("img", {"class":"pictures"}))
    longStr = with_surrogates(longStrCoded)
    print('long str: ' + longStr)
    extract = longStr.split('src="')[1].split('"')[0]
    print(extract)  

Then finally, you would load the page like this in your MainWindow class.

def doStuff(self):
    self.page = Page(self.userUrl.text())
    self.page.load_html(callback=do_stuff)

Note the use of self here. If we do not store the page instance in the class it will get deleted before it has finished loading, and the callback will never get called.

user3419537
  • 4,740
  • 2
  • 24
  • 42