0

I am currently trying to get the href out of the following web page structure:

<div style="something> # THIS IS THE MAIN DIV I CAN GET
    <div class="aegieogji"> # First ROW sub-div under the main div
        <div class="aegegaegeg"> # SUB-SUB-DIV
            <a class=egaiegeigaegeigaegge", href="link_I_need">Text</a> # First HREF
        <div class="eagegeg"> # SUB-SUB-DIV
            <a class=egaegegaegaeg", href="link_I_need">Text</a> # Second HREF
        <div class="agaeheahrhrahrhr"> # SUB-SUB-DIV
            <a class=arhrharhrahrah", href="link_I_need">Text</a> # Third HREF

    <div class="argagragragaw"> # Second ROW subdiv under the main div
        <div class="aarhrahrah"> # SUB=SUB-DIV
            <a class=arhahrhahr", href="link_I_need">Text</a> # First HREF
        <div class="ahrrahrae"> # SUB-SUB-DIV
            <a class=eagregargreg", href="link_I_need">Text</a> # Second HREF
        <div class="ergrgegaegr"> # SUB-SUB-DIV
            <a class=aegaegregrege", href="link_I_need">Text</a> # Third HREF
        ...
        ...
</div>

Using Python Selenium and ChromeDriver I can read the main div "something":

main_elem = browser.find_element(By.XPATH, "/html/body/div[2]/div/div/div/div[1]/div/div/div/div[1]/div[1]/div[2]/section/main/article/div[2]/div/div[1]")

Now, from here I am struggling using correctly Selenium to get all the links under href for all the sub-sub-div.

Do you have any idea on how I can easily get those? Thank you

PS: I can see that the first sub-sub-div has the following xpath:

/html/body/div[2]/div/div/div/div[1]/div/div/div/div[1]/div[1]/div[2]/section/main/article/div[2]/div/div[1]/div[1]

Then the second:

/html/body/div[2]/div/div/div/div[1]/div/div/div/div[1]/div[1]/div[2]/section/main/article/div[2]/div/div[1]/div[2]

and so on while the second row sub-sub-div xpath is:

/html/body/div[2]/div/div/div/div[1]/div/div/div/div[1]/div[1]/div[2]/section/main/article/div[2]/div/div[2]/div[1]

so there's div[2] rather div[1] and so on.

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Frank8992
  • 47
  • 7

3 Answers3

0

Once you have the main (parent) element you can get all the child elements containing href attribute and get their values, as following:

children = main_elem.find_elements(By.XPATH, ".//a[href]")
for child in children:
    href = child.get_attribute("href")
    print(href)
Prophet
  • 32,350
  • 22
  • 54
  • 79
0

To extract the values of all the href attributes you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following locator strategies:

  • Using CSS_SELECTOR:

    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[style='something'] div div>a")))])
    
  • Using XPATH:

    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@style='something']//div//div/a")))])
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
0

Thanks very much for the help. I merged both comments and found the solution for my case as:

# read the main DIV with XPATH 
...
# read all the sub-divs
link_elems = element.find_elements(By.XPATH, './/div//div//div/a')
# retrieve the href
for link_elem in link_elems:
    sub_div = link_elem.find_elements(By.XPATH, '//a[starts-with(@href, "/p/")]')
    for sub in sub_div:
        post_href = sub.get_attribute("href")
Frank8992
  • 47
  • 7