0

I want to get a webpage.

The url is https://land.3fang.com/LandAssessment/b6d8b2c8-bd4f-4bd4-9d22-ca49a7a2dc1f.html.

The webpage will generate two values with javascript.

Just input 5 in the text box and press the red button.

Two values in red will be returned.

Please refer to the following image.

enter image description here

I tried using pyqt5, requests_html and pyppeteer.

Here is the code for pyqt5:

import sys
from PyQt5 import QtCore, QtWidgets, QtWebEngineWidgets
from bs4 import BeautifulSoup

class Render(QtWebEngineWidgets.QWebEnginePage):
    def __init__(self, url):
        self.html = ""
        self.first_pass = True
        self.app = QtWidgets.QApplication(sys.argv)
        super(Render, self).__init__()
        self.loadFinished.connect(self._load_finished)
        self.loadProgress.connect(print)
        self.load(QtCore.QUrl(url))
        self.app.exec_()

    def _load_finished(self, result):
        if result:
            self.call_js()

    def call_js(self):
        self.runJavaScript('document.getElementById("txtDistance").value = "5";')
        self.runJavaScript("void(0)")
        self.runJavaScript("CheckUserWhere();")
        self.toHtml(self.callable)

    def callable(self, data):
        self.html = data
        self.app.quit()

url = "https://land.3fang.com/LandAssessment/b6d8b2c8-bd4f-4bd4-9d22-ca49a7a2dc1f.html"
web = Render(url)
soup = BeautifulSoup(web.html, 'html.parser')
_bpgj = soup.find('b', {'id':"_bpgj"}).string
_bSumPrice = soup.find('b', {'id':"_bSumPrice"}).string
print(_bpgj, _bSumPrice)

However, the results are either the IDLE restarts or there no response for a long time.

How to do it correctly?

Thank you very much.

Chan
  • 3,605
  • 9
  • 29
  • 60

1 Answers1

0

Since your page is rendered using Javascript, you need a web driver like selenium that can render javascript content.

Checkout the following question as well for better existing answers Web-scraping JavaScript page with Python

Murtaza Haji
  • 1,093
  • 1
  • 13
  • 32