0

I did web scraping on a site. It is taking only 1st 20 elements in the page. The remaining elements will be loaded if we scroll down. How to scrape those elements too? Is there any different method to do that?

import requests
from bs4 import BeautifulSoup

r=requests.get("https://www.century21.com/real-estate/rock-spring-ga/LCGAROCKSPRING/")
c=r.content
c

soup=BeautifulSoup(c,"html5lib")
soup

all=soup.find_all("div",{"class":"property-card-primary-info"})
len(all)

It is giving only 20. Not all. How to scrape the hidden elements too?

Vaishali
  • 37,545
  • 5
  • 58
  • 86
Akhil Reddy
  • 371
  • 1
  • 6
  • 26
  • other elements seem to be loaded after a scrolling action, you might need another tool to extract them. – PRMoureu Nov 11 '17 at 08:18
  • what kind of tool? – Akhil Reddy Nov 11 '17 at 08:19
  • selenium could work, https://stackoverflow.com/questions/14583560/selenium-retrieve-data-that-loads-while-scrolling-down – PRMoureu Nov 11 '17 at 08:35
  • As the items (you wanna parse) available in that page are not located in the body position, so any such method containing `body` as argument won't be able to make the browser scroll down. Such as `driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")` or `driver.find_element_by_tag_name("body")`. You need to use any tag name or class name which is attached to any element you are after and then using that you can make your browser scroll down. – SIM Nov 12 '17 at 12:37

2 Answers2

1

There are two different approach to this.

The first: Use a web scraping by retrieving a data API behind the site. You will need to understand what is bringing the new information for the site after the scroll. To understand that, open your browser dev tools (F12 in Chrome) in network area and observe what is being called after the scroll.

The second: Use Selenium to open a browser instance and load the page like a normal browser, scroll the page, and retrieve the information.

Gui
  • 763
  • 1
  • 7
  • 20
1

Use selenium to scroll down and then you can scrape the contents

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

browser = webdriver.Chrome(executable_path=os.path.join(os.getcwd(),'chromedriver'))
browser.get(link)

body = browser.find_element_by_tag_name("body")

no_of_pagedowns = 2 #Enter number of pages that you would like to scroll here

while no_of_pagedowns:
    body.send_keys(Keys.PAGE_DOWN)
    no_of_pagedowns-=1
Siva
  • 509
  • 6
  • 22