1

I tried to fetch all special div tag with a class of "someClass" from a website

Website need to scroll down to load new div elements so I used Keys.PAGE_DOWN, that worked and scrolled but the data wasn't complete again

So I used:

elem = driver.find_element(By.TAG_NAME, "body")


no_of_pagedowns = 23

while no_of_pagedowns:
    elem.send_keys(Keys.PAGE_DOWN)
    time.sleep(0.7)
    no_of_pagedowns-=1

It will Scroll till the entire html page load but when I want to write data in a file, it just write 20 div tag instead of hundred ...

Complete Code:

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager


driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

url = 'https://divar.ir/s/tehran/buy-apartment/parand?price=200000000-450000000&non-negotiable=true&has-photo=true&q=%D8%AE%D8%A7%D9%86%D9%87%20%D9%BE%D8%B1%D9%86%D8%AF'
driver.get(url)


elem = driver.find_element(By.TAG_NAME, "body")


no_of_pagedowns = 23

while no_of_pagedowns:
    elem.send_keys(Keys.PAGE_DOWN)
    time.sleep(0.3)
    no_of_pagedowns-=1

datas = driver.find_elements(By.CLASS_NAME, 'kt-post-card__body')

f = open('data.txt', 'w')
counter = 1
for data in range(len(datas)):
    f.write(f'{counter}--> {datas[data].text}')
    counter += 1
    f.write('\n')

f.close()
driver.quit()
Mosihere
  • 30
  • 10

3 Answers3

1

To select only 20 <div> tag instead of hundreds you can use list slicing and you can use either of the following locator strategies:

  • Using CSS_SELECTOR

    elements = driver.find_elements(By.CSS_SELECTOR, "div.kt-post-card__body")[:20]
    
  • Using XPATH:

    elements = driver.find_elements(By.XPATH, "//div[@class='kt-post-card__body']")[:20]
    

Ideally you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following locator strategies:

  • Using CSS_SELECTOR

    elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.kt-post-card__body")))[:20]
    
  • Using XPATH:

    elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='kt-post-card__body']")))[:20]
    

Update

To select all the <div>s:

To select all the <div>s you can use list slicing and you can use either of the following locator strategies:

  • Using CSS_SELECTOR

    elements = driver.find_elements(By.CSS_SELECTOR, "div.kt-post-card__body")
    
  • Using XPATH:

    elements = driver.find_elements(By.XPATH, "//div[@class='kt-post-card__body']")
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • Thanks but `Exactly I need All Div Element (hunderds)`, I just want to get all div elements not 20 the problem was that. Also I remove `list slicing` and check your code, it wont work :( @undetected Selenium – Mosihere Mar 27 '22 at 22:16
  • @Mosihere Checkout the updated answer and let me know the status. – undetected Selenium Mar 27 '22 at 22:18
  • One time 20 items, another time 24. but I guess there are more than 100 items exists – Mosihere Mar 27 '22 at 22:22
0

I checked the site and as I figured out they get data as json by using api and cursor. Cursor here made with a time expression and the variable called last-post-date. When entered to the site this value is given as lastPostDate inside a json. To obtain data fastly from the site this requests can be used: https://divar.ir/s/tehran/buy-apartment/parand?price=200000000-450000000&non-negotiable=true&has-photo=true&q=%D8%AE%D8%A7%D9%86%D9%87%20%D9%BE%D8%B1%D9%86%D8%AF lastPostDate value should be taken from this link and lastPostDate value in the JSON below should be updated with it.

{"json_schema":{"category":{"value":"apartment-sell"},"districts":{"vacancies":["427"]},"price":{"max":450000000,"min":200000000},"non-negotiable":true,"has-photo":true,"query":"خانه پرند"},"last-post-date":1647005920188580}

This updated JSON should be sent to API link below as POST. https://api.divar.ir/v8/search/1/apartment-sell

And a new JSON should be returned. And inside this JSON there is a “last_post_date”. New queries could be made by using this variable. Also required data is stored in this JSON.

This is just an idea. It seems to be working when I test it with POSTMAN.

Tarık
  • 34
  • 3
0

The Problem was implicit pagination!

So I use a for loop and each time update page number :)

for page in range(1, 11):
    url = f'https://divar.ir/s/tehran/buy-apartment/parand?price=450000000-200000000&non-negotiable=true&has-photo=true&page={page}'
    driver.get(url)
Mosihere
  • 30
  • 10