4

I'm trying to get html code from the QWebEnginePage object. According to Qt reference, QWebEnginePage object's 'toHtml' is asynchronous method as below.

Asynchronous method to retrieve the page's content as HTML, enclosed in HTML and BODY tags. Upon successful completion, resultCallback is called with the page's content.

so I tried to find out how call this method synchronously.

the result what i want to get is below.

class MainWindow(QWidget):
  html = None
  ...
  ...
  def store_html(self, data):
    self.html = data

  def get_html(self):
    current_page = self.web_view.page()
    current_page.toHtml(self.store_html)
    # I want to wait until the 'store_html' method is finished
    # but the 'toHtml' is called asynchronously, return None when try to return self.html value like below.
    return self.html 
  ...
  ...
desertnaut
  • 57,590
  • 26
  • 140
  • 166
ko.nyk.93
  • 43
  • 4
  • It's unclear why you'd want this. QWebEngine is based on Blink, which runs a separate process for web content (just like most modern browsers.) Since the IPC call between the processes may take time, QWebEngine asks you to define a callback function so your main process' event loop can continue while the IPC call completes. So without knowing the justification for this question it would be a stab in the dark to provide the best possible answer. – MrEricSir Nov 02 '17 at 04:03
  • @MrEricSir I didn't know about the QWebEngine is base on Blink framework. I just wanted to transform the web view screen's contents after html response delievered using some buttons what i created. thank you for your answering. – ko.nyk.93 Nov 02 '17 at 04:32

3 Answers3

4

A simple way to get that behavior is to use QEventLoop(). An object of this class prevents the code that is after exec_() from being executed, this does not mean that the GUI does not continue working.

from PyQt5.QtCore import *
from PyQt5.QtWidgets import *
from PyQt5.QtWebEngineWidgets import *


class Widget(QWidget):
    toHtmlFinished = pyqtSignal()

    def __init__(self, *args, **kwargs):
        QWidget.__init__(self, *args, **kwargs)
        self.setLayout(QVBoxLayout())
        self.web_view = QWebEngineView(self)
        self.web_view.load(QUrl("http://doc.qt.io/qt-5/qeventloop.html"))
        btn = QPushButton("Get HTML", self)
        self.layout().addWidget(self.web_view)
        self.layout().addWidget(btn)
        btn.clicked.connect(self.get_html)
        self.html = ""

    def store_html(self, html):
        self.html = html
        self.toHtmlFinished.emit()

    def get_html(self):
        current_page = self.web_view.page()
        current_page.toHtml(self.store_html)
        loop = QEventLoop()
        self.toHtmlFinished.connect(loop.quit)
        loop.exec_()
        print(self.html)


if __name__ == '__main__':
    import sys
    app = QApplication(sys.argv)
    w = Widget()
    w.show()
    sys.exit(app.exec_())

Note: The same method works for PySide2.

eyllanesc
  • 235,170
  • 19
  • 170
  • 241
1

Here's a different approach and also a different behavior compared to the QEventLoop method

You can subclass from QWebEngineView and expand upon the load() functionality with loadFinished Signal and create a custom method read_html()

class MyWebView(QWebEngineView):

    def __init__(self, parent):
        super(MyWebView, self).__init__(parent)
        self.html = None

    def read_html(self, url):
        """
        Load url and read webpage content in html
        """
        def read_page():
            def process_html(html):
                self.html = html
            self.page().toHtml(process_html)

        self.load(url)
        self.loadFinished.connect(read_page)

this way the application won't halt while waiting the page to finish loading in the event loop, but once the page is loaded, you can access the HTML content.

class MainWindow(QWidget):
    def __int__(self):
        ...
        self.web_view = MyWebView(self)
        self.web_view.read_html(r'https://www.xingyulei.com/')
        ...
        self.btn.clicked.connect(self.print_html)

    def print_html(self):
        print(self.view.html)
xingyulei
  • 51
  • 7
0

You could use a multiprocessing.Connection object created as one side of a multiprocessing.Pipe's send method as the call back and then use the other end of the pipe's recv method immediately after. Recv will block until the html is received, so keep that in mind

example:

from multiprocessing import Pipe

class MainWindow(QWidget):
    def __init__(...):
        ...
        self.from_loopback,self.to_loopback=Pipe(False)

    def get_html(self):
        current_page = self.web_view.page()
        current_page.toHtml(self.to_loopback.send)
        return self.from_loopback.recv() 
eyllanesc
  • 235,170
  • 19
  • 170
  • 241
norweeg
  • 101
  • 1
  • 4
  • I see that now, but that's kind of dumb. It breaks up the flow of the conversation – norweeg Aug 23 '18 at 17:49
  • by the way, here's the docs on multiprocessing.Pipe: https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Pipe https://docs.python.org/3/library/multiprocessing.html#pipes-and-queues – norweeg Aug 23 '18 at 17:52