1

I am trying to extract data from https://www.realestate.com.au/ First I create my url based on the type of property that I am looking for and then I open the url using selenium webdriver, but the page is blank! Any idea why it happens? Is it because this website doesn't provide web scraping permission? Is there any way to scrape this website?

Here is my code:

from selenium import webdriver
from bs4 import BeautifulSoup
import time

PostCode = "2153"
propertyType = "house"
minBedrooms = "3"
maxBedrooms = "4"
page = "1"

url = "https://www.realestate.com.au/sold/property-{p}-with-{mib}-bedrooms-in-{po}/list-{pa}?maxBeds={mab}&includeSurrounding=false".format(p = propertyType, mib = minBedrooms, po = PostCode, pa = page, mab = maxBedrooms)
print(url)
# url should be "https://www.realestate.com.au/sold/property-house-with-3-bedrooms-in-2153/list-1?maxBeds=4&includeSurrounding=false"

driver = webdriver.Edge("./msedgedriver.exe") # edit the address to where your driver is located
driver.get(url)
time.sleep(3)

src = driver.page_source
soup = BeautifulSoup(src, 'html.parser')
print(soup)
Iman
  • 11
  • 3
  • `driver.get(url)` this does not show any data in UI ? Also did you try with chrome driver ? – cruisepandey Sep 06 '21 at 06:44
  • Check out [robots.txt](https://www.realestate.com.au/robots.txt), they prohibit automated access to their website – Rustam Garayev Sep 06 '21 at 06:56
  • Thanks @cruisepandey for your response. I don't think different driver solve this issue. As Rustam pointed they strictly prohibits any automated access :( – Iman Sep 06 '21 at 07:09

2 Answers2

0

you are passing the link incorrectly, try it

driver.get("your link")

api - https://selenium-python.readthedocs.io/api.html?highlight=get#:~:text=ef_driver.get(%22http%3A//www.google.co.in/%22)

Vadim
  • 1
  • 1
  • Thanks Vadim, May I know why you think it is incorrect? It still gives me nothing even if I put the link directly into the .get() function. – Iman Sep 06 '21 at 08:10
  • I copy and paste the link directly from browser. – Iman Sep 08 '21 at 01:52
  • here is the link. Please try and let me know how you go https://www.realestate.com.au/sold/property-house-with-3-bedrooms-in-2153%3b/list-1?maxBeds=4&includeSurrounding=false – Iman Sep 08 '21 at 22:15
0

I did try to access realestate.com.au through selenium, and in a different use case through scrapy. I even got the results from scrapy crawling through use of proper user-agent and cookie but after a few days realestate.com.au detects selenium / scrapy and blocks the requests.

Additionally, it it clearly written in their terms & conditions that indexing any content in their website is strictly prohibited.

You can find more information / analysis in these questions:

  1. Chrome browser initiated through ChromeDriver gets detected
  2. selenium isn't loading the page

Bottom line is, you have to surpass their security if you want to scrape the content.

Hamza Tasneem
  • 74
  • 3
  • 12