eliminating duplicate links on the webpage and avoid link is stale error

Question

I have a list of 20 links and some of them are duplicates. I click onto the first link which leads me to the next page, I download some files from the next page.

Page 1

Link 1
Link 2
Link 3
link 1
link 3
link 4
link 2

Link 1 (click) --> (opens) Page 2

Page 2 (click back button browser) --> (goes back to) Page 1

Now I click on Link 2 and repeat the same thing.

             System.setProperty("webdriver.chrome.driver", "C:\\chromedriver.exe"); 
    String fileDownloadPath = "C:\\Users\\Public\\Downloads"; 


    //Set properties to supress popups
    Map<String, Object> prefsMap = new HashMap<String, Object>();
    prefsMap.put("profile.default_content_settings.popups", 0);
    prefsMap.put("download.default_directory", fileDownloadPath);
    prefsMap.put("plugins.always_open_pdf_externally", true);
    prefsMap.put("safebrowsing.enabled", "false"); 

    //assign driver properties
    ChromeOptions option = new ChromeOptions();
    option.setExperimentalOption("prefs", prefsMap);
    option.addArguments("--test-type");
    option.addArguments("--disable-extensions");
    option.addArguments("--safebrowsing-disable-download-protection");
    option.addArguments("--safebrowsing-disable-extension-blacklist");


    WebDriver driver  = new ChromeDriver(option);
           driver.get("http://www.mywebpage.com/");

           List<WebElement> listOfLinks = driver.findElements(By.xpath("//a[contains(@href,'Link')]"));
        Thread.sleep(500);



        pageSize = listOfLinks.size();

        System.out.println( "The number of links in the page is: " + pageSize);

        //iterate through all the links on the page
        for ( int i = 0; i < pageSize; i++)
        {

            System.out.println( "Clicking on link: " + i );
            try 
            {
                    linkText = listOfLinks.get(i).getText();
                    listOfLinks.get(i).click();
            }
            catch(org.openqa.selenium.StaleElementReferenceException ex)
            {
                listOfLinks = driver.findElements(By.xpath("//a[contains(@href,'Link')]"));
                linkText = listOfLinks.get(i).getText();
                listOfLinks.get(i).click();
            }
               try 
            {
              driver.findElement(By.xpath("//span[contains(@title,'download')]")).click();

            }
            catch (org.openqa.selenium.NoSuchElementException ee)
            {
                driver.navigate().back();
                Thread.sleep(300);
                continue;
            }
      Thread.sleep(300);                 
            driver.navigate().back();
            Thread.sleep(100);
        }

The code is working fine and clicks on all the links and downloads the files. Now I need to improve the logic omit the duplicate links. I tried to filter out the duplicates in the list but then not sure how should I handle the org.openqa.selenium.StaleElementReferenceException. The solution I am looking for is to click on the first occurrence of the link and avoid clicking on the link if it re-occurs.

(This is part of a complex logic to download multiple files from a portal >that I don't have control over. Hence please don't come back with the >questions like why there are duplicate links on the page at the first place.)

Hi, what if add already visited links to a separate variable and look before the transition, is the next link present in the list of visited? — AtachiShadow, Sep 06 '19 at 01:25
Check my answer with detailed explanation on how to get only unique links and handling the stale elements. Let me know if you have any questions. — supputuri, Sep 06 '19 at 02:23

Spencer Melo · Accepted Answer · 2019-09-06T05:32:18.540

2

First I don't suggest you to be doing requests (findElements) to the WebDriver repeatedly, you will see a lot of performance issues following this path, mainly if you have a lot of links, and pages.

Also if you are doing the same thing always on the same tab, you will need to wait the refresh 2 times ( page of the links and page of the download ), now if you open each link in a new tab, you just need to wait the refresh of the page where you will download.

I have a suggestion, just distinct repeated links as @supputuri said and open each link in a NEW tab, in this way you don't need to handle stale, don't need to be searching on the screen every time for the links and don't need to wait the refresh of the page with links in each iteration.

List<WebElement> uniqueLinks = driver.findElements(By.xpath("//a[contains(@href,'Link')][not(@href = following::a/@href)]"));

for ( int i = 0; i < uniqueLinks.size(); i++)
{
    new Actions(driver)
         .keyDown(Keys.CONTROL)
         .click(uniqueLinks.get(i))
         .keyUp(Keys.CONTROL)
         .build()
         .perform();
    // if you want you can create the array here on this line instead of create inside the method below.
    driver.switchTo().window(new ArrayList<>(driver.getWindowHandles()).get(1));
    //do your wait stuff.
    driver.findElement(By.xpath("//span[contains(@title,'download')]")).click();
    //do your wait stuff.
    driver.close();
    driver.switchTo().window(new ArrayList<>(driver.getWindowHandles()).get(0));
}

I'm not in a place where I was able to test my code properly right now, any issues on this code just comment and I will update the answer, but the idea is right and it's pretty simple.

edited Sep 06 '19 at 05:32

answered Sep 06 '19 at 03:01

Spencer Melo

410
4
14

I like the idea of opening the links in new tabs, but we might have to think about opening the href in new Tab. In order to do that, you have to make sure you get the href (not click on link) and open the new window or tab with the href using js and handle the new window for the downloads `(which might give you some issues when running in IE)`. Also the resources consumed by each window instance (as selenium consider each tab as a separate window). – supputuri Sep 06 '19 at 03:12
I don't know the application that OP is using, but usually the application will have the cache, that will make sure the perf does not impact. It's up to the OP how he want's to implement it, but those are my 2 items you should be vigilant if you want to go with this route. – supputuri Sep 06 '19 at 03:14
@Spencer Thanks for the great solution approach. driver.switchTo().window(driver.getWindowHandles().get(1)) this is giving syntax error. get(1) is the right syntax? – Prem Sep 06 '19 at 03:47
Try with `driver.switchTo().window(new ArrayList (driver.getWindowHandles()).get(1));` FYI, this might not work in FF and IE. – supputuri Sep 06 '19 at 03:51
@supputuri it just opens the first link in the same window. Now I have 14 unique links and thought it would open 14 new tabs. I am using Chrome. – Prem Sep 06 '19 at 04:06
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/199046/discussion-between-prem-and-supputuri). – Prem Sep 06 '19 at 04:27
Hey, sorry it should be Keys.CONTROL and not command, and also the part from supputuri is right, creating the ArrayList, will edit response to use it, thanks @supputuri – Spencer Melo Sep 06 '19 at 05:32

supputuri · Answer 2 · 2019-09-06T04:56:37.333

First lets see the xpath.

Sample HTML:

<!DOCTYPE html>
<html>
 <body>
 <div>
  <a href='https://google.com'>Google</a>
  <a href='https://yahoo.com'>Yahoo</a>
  <a href='https://google.com'>Google</a>
  <a href='https://msn.com'>MSN</a>
 </body>
</html>

Let's see the xpath to get the distinct Links out of the above.

//a[not(@href = following::a/@href)]

The logic in xpath is we are making sure the href of the link is not matching with any following links href, if it's match then it's considered as duplicate and xpath does not return that element.

Stale Element: So, now it's time to handle the stale element issue in your code. The moment you click on the Link 1 all the references stored in listOfLinks will be invalid as selenium will get assign the new references to the elements each time they load on the page. And when you try to access the elements with old reference you will get the stale element exception. Here is the snippet of code that should give you an idea.

List<WebElement> listOfLinks = driver.findElements(By.xpath("//a[contains(@href,'Link')][not(@href = following::a/@href)]"));
Thread.sleep(500);
pageSize = listOfLinks.size();
System.out.println( "The number of links in the page is: " + pageSize);
//iterate through all the links on the page
for ( int i = 0; i < pageSize; i++)
{
    // ===> consider adding step to explicit wait for the Link element with "//a[contains(@href,'Link')][not(@href = following::a/@href)]" xpath present using WebDriverWait 
    // don't hard code the sleep 
    // ===> added this line
    <WebElement> link = driver.findElements(By.xpath("//a[contains(@href,'Link')][not(@href = following::a/@href)]")).get(i);
    System.out.println( "Clicking on link: " + i );
    // ===> updated next 2 lines
    linkText = link.getText();
    link.click();
    // ===> consider adding explicit wait using WebDriverWait to make sure the span exist before clicking. 
    driver.findElement(By.xpath("//span[contains(@title,'download')]")).click();
    // ===> check this answer (https://stackoverflow.com/questions/34548041/selenium-give-file-name-when-downloading/56570364#56570364) for make sure the download is completed before clicking on browser back rather than sleep for x seconds.
    driver.navigate().back();
    // ===>  removed hard coded wait time (sleep)
}

xpath ScreenShot:

Edit1:

If you want to open the link in the new window then use the below logic.

WebDriverWait wait = new WebDriverWait(driver, 20);
        wait.until(ExpectedConditions.presenceOfAllElementsLocatedBy(By.xpath("//a[contains(@href,'Link')][not(@href = following::a/@href)]")));
        List<WebElement> listOfLinks = driver.findElements(By.xpath("//a[contains(@href,'Link')][not(@href = following::a/@href)]"));
        JavascriptExecutor js = (JavascriptExecutor) driver; 
        for (WebElement link : listOfLinks) {
            // get the href
            String href = link.getAttribute("href");
            // open the link in new tab
            js.executeScript("window.open('" + href +"')");
            // switch to new tab
            ArrayList<String> tabs = new ArrayList<String> (driver.getWindowHandles());
            driver.switchTo().window(tabs.get(1));
            //click on download

            //close the new tab
            driver.close();
            // switch to parent window
            driver.switchTo().window(tabs.get(0));
         }

Screenshot: Sorry for the poor quality of the screenshot, could not upload the high quality video due to size limit.

Added the information to get the distinct link element (excluding duplicate links) using xpath. — supputuri, Sep 06 '19 at 02:23

score -1 · Answer 3 · answered Sep 06 '19 at 01:27

-1

you can do like this.

Save Index of element in the list to a hashtable
if Hashtable already contains, skip it
once done, HT has only unique elements, ie first foundones

Values of HT are the index from listOfLinks

        HashTable < String, Integer > hs1 = new HashTable(String, Integer);
        for (int i = 0; i < listOfLinks.size(); i++) {
            if (!hs1.contains(e.getText()) {

                    hs1.add(e.getText(), i);
                }
            }
            for (int i: hs1.values()) {

                listOfLinks.get(i).click();
            }

answered Sep 06 '19 at 01:27

Arun Nair

425
3
11

Hi @Arun I tried this way. The problem is the exception section. After I click on the back button and come back to Page 1 the list of links is already stale. There should be a better way of handling this which I am not able to figure out. catch(org.openqa.selenium.StaleElementReferenceException ex) { listOfLinks = driver.findElements(By.xpath("//a[contains(@href,'Link')]")); linkText = listOfLinks.get(i).getText(); listOfLinks.get(i).click(); } – Prem Sep 06 '19 at 01:48
@Prem Just FYI, Selenium will refresh the element references as and when you click on the element and the elements on the page reloads. So, you can't no longer use the old element reference. Check my answer below and throw your ideas/comments if any. – supputuri Sep 06 '19 at 02:29

eliminating duplicate links on the webpage and avoid link is stale error

Page 1

3 Answers3