-1

I am trying to use beautiful soup to parse the html from here: https://www.rightmove.co.uk/house-prices/br5/broadcroft-road.html?page=1

I have:

req=requests.get(url)
# page_soup = soup(req.content,'html.parser')
page_soup = soup(req.content,'lxml') 
no_results= page_soup.find('div',{'class':'section sort-bar-results'})
containers = page_soup.findAll('div',{'class':'propertyCard'})
no_results, len(containers)

this returns (None, 0)

I looked at Beautiful Soup find() returns None?, Beautiful Soup returns 'none', Beautiful soup returns None, Beautiful Soup returns None on existing element, but unfortunately none have helped me

The sections of the html correspond to:

enter image description here

and enter image description here

Is there something obvious that I am missing?

frank
  • 3,036
  • 7
  • 33
  • 65

2 Answers2

1

The page indeed have a dynamic content. And you should use selenium with a webdriver to load all the content before scraping.

You can try downloading ChromeDriver executable here. And if you paste it in the same folder as your script you can run:

import os
from selenium import webdriver
from bs4 import BeautifulSoup

# configure driver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless")
chrome_driver = os.getcwd() + "\\chromedriver.exe"  # IF NOT IN SAME FOLDER CHANGE THIS PATH
driver = webdriver.Chrome(options=chrome_options, executable_path=chrome_driver)

url = 'https://www.rightmove.co.uk/house-prices/br5/broadcroft-road.html?page=1'

driver.get(url)
page_soup = soup(driver.page_source, "html.parser")
no_results= page_soup.find('div',{'class':'section sort-bar-results'})
containers = page_soup.findAll('div',{'class':'propertyCard'})
print(no_results.text)
print(len(containers), "containers")

You said the answers didn't help but I tried it here and it ouputs:

35 sold properties
25 containers
Arthur Pereira
  • 1,509
  • 2
  • 8
  • 18
0
import requests
import re
import json


def main(url):
    r = requests.get(url)
    match = json.loads(
        re.search(r'__PRELOADED_STATE__.+?({.+?})<', r.text).group(1))
    # print(match.keys())  # Full JSON DICT Keys # match.keys()
    print(match['results']['resultCount'])


main("https://www.rightmove.co.uk/house-prices/br5/broadcroft-road.html?page=1")

Output:

35

You don't need to use selenium as it's will slow down your task. the desired element is presented within page source code as it's encoded within Dynamic <script> tag