1

I am trying to scrape links inside child elements href attribute from parent id='search-properties' from this site. I firstly tried locating elements using find_elements_by_id and then locating links with find_elements_by_css_selector but I constantly got AttributeError: 'list' object has no attribute 'find_elements_by_css_selectors' while doing it so I tried using find_elements_by_tag_name as well as find_elements_by_xpath but instead of scraping links it actually scraped the details inside the links which are of no use to me. so after a lot of looking around I finally found this code

from logging import exception
from typing import Text
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
import time
import pandas as pd
from selenium.webdriver.support.ui import Select
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
import csv
from selenium import webdriver 
PATH = "C:/ProgramData/Anaconda3/scripts/chromedriver.exe" #always keeps chromedriver.exe inside scripts to save hours of debugging
driver =webdriver.Chrome(PATH) #preety important part
driver.get("https://www.gharbazar.com/property/search/?_q=&_vt=1&_r=0&_pt=residential&_si=0&_srt=latest")
driver.implicitly_wait(10)
house=driver.find_elements_by_tag_name("a")
# traverse list
for lnk in house:
   # get_attribute() to get all href
   print(lnk.get_attribute('href'))

The problem with this code is that it scrapes all the links meaning it also has links which are absolutely unnecessary like in this image don't need javascript void. Finally for pagination I tried to follow this answer but got infinite loop and so I had to remove the code of pagination. In conclusion I am trying to get links of multiple pages having id = 'search-properties'

Recurfor
  • 71
  • 8

2 Answers2

0

Try this.

    links = driver.find_elements_by_xpath("//div[@id = 'search-properties']/a")

    for ele in links:
        print(ele.get_attribute('href'))
pmadhu
  • 3,373
  • 2
  • 11
  • 23
  • I appreciate your effort but it did not handle the pagination part which is my main concern...If my question is difficult to understand or vague let me know.. I am willing to clarify or edit – Recurfor Jul 22 '21 at 09:48
0

I tried this for pagination.

    from selenium import webdriver
    import time

    driver = webdriver.Chrome(executable_path="path")
    driver.implicitly_wait(10)
    driver.get("https://www.gharbazar.com/property/search/?_q=&_vt=1&_r=0&_pt=residential&_si=0&_srt=latest")
    page=2

    while True:
        nextoption = driver.find_element_by_xpath("//div[@id='pagination-div']//a[contains(text(),'>>')]")
        driver.execute_script("arguments[0].scrollIntoView(true);",nextoption)
        driver.execute_script("window.scrollBy(0,-300)")
        time.sleep(5)
        try:
            driver.find_element_by_link_text(str(page)).click()
            page += 1
            time.sleep(3)

        except Exception as e:
            print(e)
            break

    driver.quit()

I tried this for getting links from each page.

    driver.get("https://www.gharbazar.com/property/search/?_q=&_vt=1&_r=0&_pt=residential&_si=0&_srt=latest")
    page=2
    pagelinks= []
    #links of the 1st page
    links = driver.find_elements_by_xpath("//div[@id = 'search-properties']/a")
    for ele in links:
        pagelinks.append(ele.get_attribute('href'))

    while True:
        nextoption = driver.find_element_by_xpath("//div[@id='pagination-div']//a[contains(text(),'>>')]")
        driver.execute_script("arguments[0].scrollIntoView(true);",nextoption)
        driver.execute_script("window.scrollBy(0,-300)")
        time.sleep(5)
        try:
            driver.find_element_by_link_text(str(page)).click()
            page += 1
            links = driver.find_elements_by_xpath("//div[@id = 'search-properties']/a")
            for ele in links:
                pagelinks.append(ele.get_attribute('href'))
            time.sleep(3)

        except Exception as e:
            print(e)
            break

    print(len(pagelinks))
    for i in range(len(pagelinks)):
        print(pagelinks[i])

    driver.quit()
Dharman
  • 30,962
  • 25
  • 85
  • 135
pmadhu
  • 3,373
  • 2
  • 11
  • 23
  • the last page i.e 14 page is not being scraped by the code, also here `nextoption = driver.find_element_by_xpath("//div[@id='pagination-div']//a[contains(text(),'>>')]")` why did u use div instead of * and what does >> do – Recurfor Jul 22 '21 at 18:38
  • Not sure of the 14th page. I use 'Tag name' instead of '*' just to make sure the right element is pointed out. '>>' is the link text , I am using that to scroll down so that the page numbers are visible for them to be Clickable. – pmadhu Jul 23 '21 at 07:31
  • last time I saw u had 11 reputation and now all of a sudden 51 ,how.. I am thinking to click the 14th page by click() method and then after scraping the individual 14th page I will be adding it into csv file which I created... I tried using try and except but nothing was displayed – Recurfor Jul 23 '21 at 07:36
  • I tried to answer other questions so 51. I tried below code for writing CSV file. `myfile = open("C:\loginsession\output.csv",'w',newline='') with myfile: writerdata = csv.writer(myfile) for ele in pagelinks: writerdata.writerow([ele]) ` – pmadhu Jul 23 '21 at 11:09
  • I have already converted it into csv file using pandas, however the last page not being scraped is still a mystery and so I have decided to manually copy last links in csv file. u have been very helpful and that is why I am going to accept your answer..one last question, why is string conversion necessary here: `driver.find_element_by_link_text(str(page)).click()` and why is arguments zero here: `driver.execute_script("arguments[0].scrollIntoView(true);",nextoption)` – Recurfor Jul 23 '21 at 20:03
  • `str(page)` because if the input is given like `"page"` , it tries search for the link test "page" not the actual page number which is `"2"` or `"3"` and so on. Refer this for `"arguments[0]"` - [link](https://stackoverflow.com/q/52273298/16452840) – pmadhu Jul 24 '21 at 05:33
  • so its done to convert integer value into string... I finally understand and also thanks for the link... – Recurfor Jul 24 '21 at 08:48