How can I get the `href` attributes of nested `` from specific `
` into a list?

Question

I have a <ul> with xpath:position = //ul[5] which contains some <a>.

The first <a> has xpath:position = //ul[5]/li/div/div/a, the next <a> has xpath:position = //ul[5]/li[2]/div/div/a and the next has xpath:position = //ul[5]/li[3]/div/div/a and goes on...

So, for every new <a> into this <ul> the xpath:position of <a> get a [#] after <li>.

What I need is an example of how I'll count how many <a> exist into this specific <ul> and then get the href attribute of each <a> into a list.

I have try this:

    WebDriver driver = DriverFactory.getWebDriver()
    def aCount = driver.findElements(By.xpath("//ul[5]/li/div/div/a")).size()
    println aCount

But it counts all the <a> of the page and not only the ones withing the <ul> with xpath:position = //ul[5]!!!

If `"//ul[5]/li/div/div/a"` is counting all `` in your page, then your `xpath` is not correct as it is hitting other places. You can track back to the previous tags of `ul[5]` to ensure that it is unique and is only hitting the desired tags. Check the `xpath` in chrome before using them in code. — fam, Feb 14 '22 at 12:19

pburgr · Answer 1 · 2022-02-14T14:45:07.227

Using absolute xpath makes the test less htmlchangeproof, better to avoid those.

All you need is a combination of:

work with parrent/child elements using element.findElements(By.by)
find child elements By.tagName(String tagName)

Code example:

package tests;

import java.util.ArrayList;
import java.util.List;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;

import selenium.ChromeDriverSetup;

public class CollectHrefsTest extends ChromeDriverSetup {

    public static void main(String[] args) {
        
        List<String> hrefs = new ArrayList<String>();
        WebDriver driver = startChromeDriver(); // wrapped driver init
        driver.get("https://www.stackoverflow.com");
        List<WebElement> ulTags = driver.findElements(By.tagName("ul"));
        for (WebElement ulTag: ulTags) {
            List<WebElement> liTags = ulTag.findElements(By.tagName("li"));
            for (WebElement liTag: liTags) {
                List<WebElement> aTags = liTag.findElements(By.tagName("a"));
                for (WebElement aTag: aTags) {
                    String href = aTag.getAttribute("href");
                    if (href != null) {
                        hrefs.add(href);
                        System.out.println(href);
                    }
                    else {
                        System.out.println("href is null");
                    }
                }
            }
        }
        System.out.println("hrefs collected: " + hrefs.size());
        driver.quit();
    }

}

Output:

Starting ChromeDriver 97.0.4692.71 (adefa7837d02a07a604c1e6eff0b3a09422ab88d-refs/branch-heads/4692@{#1247}) on port 13301
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
[1644849838.445][WARNING]: This version of ChromeDriver has not been tested with Chrome version 98.
Úno 14, 2022 3:43:58 ODP. org.openqa.selenium.remote.ProtocolHandshake createSession
INFO: Detected dialect: W3C
https://stackoverflow.com/
https://stackoverflow.com/help
https://chat.stackoverflow.com/?tab=site&host=stackoverflow.com
https://meta.stackoverflow.com/
https://stackoverflow.com/questions
https://stackoverflow.com/jobs
https://stackoverflow.com/jobs/directory/developer-jobs
https://stackoverflow.com/jobs/salary
https://stackoverflow.com/help
href is null
href is null
https://stackoverflow.com/teams
https://stackoverflow.com/talent
https://stackoverflow.com/advertising
https://stackoverflowsolutions.com/explore-teams
https://stackoverflow.co/
https://stackoverflow.co/company/press
https://stackoverflow.co/company/work-here
https://stackoverflow.com/legal
https://stackoverflow.com/legal/privacy-policy
https://stackoverflow.com/legal/terms-of-service
https://stackoverflow.co/company/contact
https://stackoverflow.com/#
https://stackoverflow.com/legal/cookie-policy
https://stackexchange.com/sites#technology
https://stackexchange.com/sites#culturerecreation
https://stackexchange.com/sites#lifearts
https://stackexchange.com/sites#science
https://stackexchange.com/sites#professional
https://stackexchange.com/sites#business
https://api.stackexchange.com/
https://data.stackexchange.com/
https://stackoverflow.blog/?blb=1
https://www.facebook.com/officialstackoverflow/
https://twitter.com/stackoverflow
https://linkedin.com/company/stack-overflow
https://www.instagram.com/thestackoverflow
hrefs collected: 35

And how I'll define in first step that I want to search only into `ul[5]`? Only in `xpath` I can see it like this... With any other way it appear just as `
`. — Simos Sigma, Feb 14 '22 at 11:56
`WebElement ul5 = driver.findElements(By.tagName("ul")).get(4);` — pburgr, Feb 14 '22 at 13:33
Okay I got this... Now how I point to child ``s and get all `href`s into a list? — Simos Sigma, Feb 14 '22 at 13:57
I had to change some things to make it work and I think it needs a `break` or something!!! Script can't stop running... — Simos Sigma, Feb 14 '22 at 15:15
I can't tell without access url you are testing. Put some breakpoints and debug the code to see what's comming in each `List`. — pburgr, Feb 14 '22 at 15:26

score 1 · Answer 2 · answered Feb 14 '22 at 22:42

All the <a> are within their ancestor <li> and all the <li>s are within //ul[5]. So the solution will be to iterate through all the <li>s and you can use the following locator strategy:

WebDriver driver = DriverFactory.getWebDriver()
def aCount = driver.findElements(By.xpath("//ul[5]//li/div/div/a")).size()
                      //note the double slash here ^
println aCount

score 0 · Accepted Answer · answered Feb 15 '22 at 13:22

The problem was that into the //ul[5] there were two kind of <a>s. The //ul[5]/li/div/div/a and the //ul[5]/li/div/div[2]/a.

At the first case the <div> which wraps the <a> has the class name (div[@class="heading-4"]/a[1]). At the second case the <div> which wraps the <a> has the class name (div[@class="heading-4-sub"]/a[1]).

When I was counting the <a>s I was getting both kind of <a>s in count.

So I had to do something like this:

WebDriver driver = DriverFactory.getWebDriver()

List<String> hrefs = []
List<WebElement> aTags = driver.findElements(By.xpath('//ul[5]/li/div/div[@class="heading-4"]/a'))

for (WebElement aTag in aTags) {
    String href = aTag.getAttribute("href")
    if (href != null) {
        hrefs.add(href);
    } else {
        hrefs.add('Empty Link');
    }
}

System.out.println(hrefs + "\n\nURLs Found: " + hrefs.size())

I was using: findElements(By.xpath("//ul[5]/li/div/div/a")) Instead of: findElements(By.xpath('//ul[5]/li/div/div[@class="heading-4"]/a')) which gets only the <a>s which are wrapped by a <div> with class name "heading-4".

https://docs.katalon.com/katalon-studio/docs/detect_elements_xpath.html#what-is-xpath

How can I get the `href` attributes of nested `` from specific `` into a list?

3 Answers3

How can I get the `href` attributes of nested `` from specific `
` into a list?