BeautifulSoup identifying only few elements in the page

Question

I did web scraping on a site. It is taking only 1st 20 elements in the page. The remaining elements will be loaded if we scroll down. How to scrape those elements too? Is there any different method to do that?

import requests
from bs4 import BeautifulSoup

r=requests.get("https://www.century21.com/real-estate/rock-spring-ga/LCGAROCKSPRING/")
c=r.content
c

soup=BeautifulSoup(c,"html5lib")
soup

all=soup.find_all("div",{"class":"property-card-primary-info"})
len(all)

It is giving only 20. Not all. How to scrape the hidden elements too?

other elements seem to be loaded after a scrolling action, you might need another tool to extract them. — PRMoureu, Nov 11 '17 at 08:18
selenium could work, https://stackoverflow.com/questions/14583560/selenium-retrieve-data-that-loads-while-scrolling-down — PRMoureu, Nov 11 '17 at 08:35
As the items (you wanna parse) available in that page are not located in the body position, so any such method containing `body` as argument won't be able to make the browser scroll down. Such as `driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")` or `driver.find_element_by_tag_name("body")`. You need to use any tag name or class name which is attached to any element you are after and then using that you can make your browser scroll down. — SIM, Nov 12 '17 at 12:37

score 1 · Answer 1 · answered Nov 11 '17 at 08:21

There are two different approach to this.

The first: Use a web scraping by retrieving a data API behind the site. You will need to understand what is bringing the new information for the site after the scroll. To understand that, open your browser dev tools (F12 in Chrome) in network area and observe what is being called after the scroll.

The second: Use Selenium to open a browser instance and load the page like a normal browser, scroll the page, and retrieve the information.

score 1 · Accepted Answer · answered Nov 11 '17 at 08:39

Use selenium to scroll down and then you can scrape the contents

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

browser = webdriver.Chrome(executable_path=os.path.join(os.getcwd(),'chromedriver'))
browser.get(link)

body = browser.find_element_by_tag_name("body")

no_of_pagedowns = 2 #Enter number of pages that you would like to scroll here

while no_of_pagedowns:
    body.send_keys(Keys.PAGE_DOWN)
    no_of_pagedowns-=1

BeautifulSoup identifying only few elements in the page

2 Answers2

Linked