how to get all the hyperlinks from child elements from a specific div container having multiple pages(pagination) using selenium python

Question

I am trying to scrape links inside child elements href attribute from parent id='search-properties' from this site. I firstly tried locating elements using find_elements_by_id and then locating links with find_elements_by_css_selector but I constantly got AttributeError: 'list' object has no attribute 'find_elements_by_css_selectors' while doing it so I tried using find_elements_by_tag_name as well as find_elements_by_xpath but instead of scraping links it actually scraped the details inside the links which are of no use to me. so after a lot of looking around I finally found this code

from logging import exception
from typing import Text
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
import time
import pandas as pd
from selenium.webdriver.support.ui import Select
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
import csv
from selenium import webdriver 
PATH = "C:/ProgramData/Anaconda3/scripts/chromedriver.exe" #always keeps chromedriver.exe inside scripts to save hours of debugging
driver =webdriver.Chrome(PATH) #preety important part
driver.get("https://www.gharbazar.com/property/search/?_q=&_vt=1&_r=0&_pt=residential&_si=0&_srt=latest")
driver.implicitly_wait(10)
house=driver.find_elements_by_tag_name("a")
# traverse list
for lnk in house:
   # get_attribute() to get all href
   print(lnk.get_attribute('href'))

The problem with this code is that it scrapes all the links meaning it also has links which are absolutely unnecessary like in this image don't need javascript void. Finally for pagination I tried to follow this answer but got infinite loop and so I had to remove the code of pagination. In conclusion I am trying to get links of multiple pages having id = 'search-properties'

is right-click disabled on the site? – YaDav MaNish Jul 22 '21 at 06:18 — YaDav MaNish, Jul 22 '21 at 06:18
yes it is, but you can access it by pressing ctrl+ shift+ j – Recurfor Jul 22 '21 at 06:20 — Recurfor, Jul 22 '21 at 06:20
Ok! Yes I did that to get the element – YaDav MaNish Jul 22 '21 at 06:23 — YaDav MaNish, Jul 22 '21 at 06:23
please do focus on scraping links from next pages – Recurfor Jul 22 '21 at 06:26 — Recurfor, Jul 22 '21 at 06:26

score 0 · Answer 1 · answered Jul 22 '21 at 06:35

0

Try this.

    links = driver.find_elements_by_xpath("//div[@id = 'search-properties']/a")

    for ele in links:
        print(ele.get_attribute('href'))

answered Jul 22 '21 at 06:35

pmadhu

3,373
2
11
23

I appreciate your effort but it did not handle the pagination part which is my main concern...If my question is difficult to understand or vague let me know.. I am willing to clarify or edit – Recurfor Jul 22 '21 at 09:48

score 0 · Accepted Answer · edited Jul 22 '21 at 16:23

0

I tried this for pagination.

    from selenium import webdriver
    import time

    driver = webdriver.Chrome(executable_path="path")
    driver.implicitly_wait(10)
    driver.get("https://www.gharbazar.com/property/search/?_q=&_vt=1&_r=0&_pt=residential&_si=0&_srt=latest")
    page=2

    while True:
        nextoption = driver.find_element_by_xpath("//div[@id='pagination-div']//a[contains(text(),'>>')]")
        driver.execute_script("arguments[0].scrollIntoView(true);",nextoption)
        driver.execute_script("window.scrollBy(0,-300)")
        time.sleep(5)
        try:
            driver.find_element_by_link_text(str(page)).click()
            page += 1
            time.sleep(3)

        except Exception as e:
            print(e)
            break

    driver.quit()

I tried this for getting links from each page.

    driver.get("https://www.gharbazar.com/property/search/?_q=&_vt=1&_r=0&_pt=residential&_si=0&_srt=latest")
    page=2
    pagelinks= []
    #links of the 1st page
    links = driver.find_elements_by_xpath("//div[@id = 'search-properties']/a")
    for ele in links:
        pagelinks.append(ele.get_attribute('href'))

    while True:
        nextoption = driver.find_element_by_xpath("//div[@id='pagination-div']//a[contains(text(),'>>')]")
        driver.execute_script("arguments[0].scrollIntoView(true);",nextoption)
        driver.execute_script("window.scrollBy(0,-300)")
        time.sleep(5)
        try:
            driver.find_element_by_link_text(str(page)).click()
            page += 1
            links = driver.find_elements_by_xpath("//div[@id = 'search-properties']/a")
            for ele in links:
                pagelinks.append(ele.get_attribute('href'))
            time.sleep(3)

        except Exception as e:
            print(e)
            break

    print(len(pagelinks))
    for i in range(len(pagelinks)):
        print(pagelinks[i])

    driver.quit()

edited Jul 22 '21 at 16:23

Dharman

30,962
25
85
135

answered Jul 22 '21 at 16:17

pmadhu

3,373
2
11
23

the last page i.e 14 page is not being scraped by the code, also here `nextoption = driver.find_element_by_xpath("//div[@id='pagination-div']//a[contains(text(),'>>')]")` why did u use div instead of * and what does >> do – Recurfor Jul 22 '21 at 18:38
Not sure of the 14th page. I use 'Tag name' instead of '*' just to make sure the right element is pointed out. '>>' is the link text , I am using that to scroll down so that the page numbers are visible for them to be Clickable. – pmadhu Jul 23 '21 at 07:31
last time I saw u had 11 reputation and now all of a sudden 51 ,how.. I am thinking to click the 14th page by click() method and then after scraping the individual 14th page I will be adding it into csv file which I created... I tried using try and except but nothing was displayed – Recurfor Jul 23 '21 at 07:36
I tried to answer other questions so 51. I tried below code for writing CSV file. `myfile = open("C:\loginsession\output.csv",'w',newline='') with myfile: writerdata = csv.writer(myfile) for ele in pagelinks: writerdata.writerow([ele]) ` – pmadhu Jul 23 '21 at 11:09
I have already converted it into csv file using pandas, however the last page not being scraped is still a mystery and so I have decided to manually copy last links in csv file. u have been very helpful and that is why I am going to accept your answer..one last question, why is string conversion necessary here: `driver.find_element_by_link_text(str(page)).click()` and why is arguments zero here: `driver.execute_script("arguments[0].scrollIntoView(true);",nextoption)` – Recurfor Jul 23 '21 at 20:03
`str(page)` because if the input is given like `"page"` , it tries search for the link test "page" not the actual page number which is `"2"` or `"3"` and so on. Refer this for `"arguments[0]"` - [link](https://stackoverflow.com/q/52273298/16452840) – pmadhu Jul 24 '21 at 05:33
so its done to convert integer value into string... I finally understand and also thanks for the link... – Recurfor Jul 24 '21 at 08:48

how to get all the hyperlinks from child elements from a specific div container having multiple pages(pagination) using selenium python

2 Answers2