beautiful soup find returns none from rightmove

Question

I am trying to use beautiful soup to parse the html from here: https://www.rightmove.co.uk/house-prices/br5/broadcroft-road.html?page=1

I have:

req=requests.get(url)
# page_soup = soup(req.content,'html.parser')
page_soup = soup(req.content,'lxml') 
no_results= page_soup.find('div',{'class':'section sort-bar-results'})
containers = page_soup.findAll('div',{'class':'propertyCard'})
no_results, len(containers)

this returns (None, 0)

I looked at Beautiful Soup find() returns None?, Beautiful Soup returns 'none', Beautiful soup returns None, Beautiful Soup returns None on existing element, but unfortunately none have helped me

The sections of the html correspond to:

and

Is there something obvious that I am missing?

score 1 · Accepted Answer · edited Feb 20 '21 at 15:43

The page indeed have a dynamic content. And you should use selenium with a webdriver to load all the content before scraping.

You can try downloading ChromeDriver executable here. And if you paste it in the same folder as your script you can run:

import os
from selenium import webdriver
from bs4 import BeautifulSoup

# configure driver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless")
chrome_driver = os.getcwd() + "\\chromedriver.exe"  # IF NOT IN SAME FOLDER CHANGE THIS PATH
driver = webdriver.Chrome(options=chrome_options, executable_path=chrome_driver)

url = 'https://www.rightmove.co.uk/house-prices/br5/broadcroft-road.html?page=1'

driver.get(url)
page_soup = soup(driver.page_source, "html.parser")
no_results= page_soup.find('div',{'class':'section sort-bar-results'})
containers = page_soup.findAll('div',{'class':'propertyCard'})
print(no_results.text)
print(len(containers), "containers")

You said the answers didn't help but I tried it here and it ouputs:

35 sold properties
25 containers

It never crossed my mind to use selenium. Thanks, worked a treat! — frank, Nov 29 '20 at 22:49
Np, it's always a pleasure to help another bioinformatician ;) — Arthur Pereira, Nov 29 '20 at 22:51

score 0 · Answer 2 · answered Nov 30 '20 at 02:29

import requests
import re
import json


def main(url):
    r = requests.get(url)
    match = json.loads(
        re.search(r'__PRELOADED_STATE__.+?({.+?})<', r.text).group(1))
    # print(match.keys())  # Full JSON DICT Keys # match.keys()
    print(match['results']['resultCount'])


main("https://www.rightmove.co.uk/house-prices/br5/broadcroft-road.html?page=1")

Output:

You don't need to use selenium as it's will slow down your task. the desired element is presented within page source code as it's encoded within Dynamic <script> tag

beautiful soup find returns none from rightmove

2 Answers2

Linked