0

I am trying to scrape a page with Selenium in C# which has several pages that I can go through by clicking a "Next" button on the page. I am usually getting the error that there is a stale element reference, which ONLY happens if I run it without breakpoints. If I go through the program step by step, it works perfectly fine. I'm assuming that Selenium is skipping over important stuff without waiting (even though I have a wait method implemented).

To the code, this is the main logic for the problem:

foundVacancies.AddRange(FindVacanciesOnPage());
const string nextBtnXPath = "//*[@id=\"ContainerResultList\"]/div/div[3]/nav/ul/li[8]/a";
if (Driver.FindElements(By.XPath(nextBtnXPath)).Count != 0)
{
    while (TryClickingNextButton(nextBtnXPath))
    {
        foundVacancies.AddRange(FindVacanciesOnPage());
    }
}

This method first gets all items on the first page and adds them to the foundVacancies list. After that, it will try to look for the "Next" button, which is not always there if there are not enough items. If it is, it will try to click it, scrape the page, and click it again until there are no pages left. This works great when debugging, but there is something very wrong with normally running.

The method for getting all items on the page, and where the error occurs:

private IEnumerable<string> FindVacanciesOnPage()
{
    var vacancies = new List<string>();

    var tableContainingAllVacancies = Driver.FindElement(By.XPath("//*[@id=\"ContainerResultList\"]/div/div[2]/div/ul"));
    var listOfVacancies = tableContainingAllVacancies.FindElements(By.XPath(".//li/article/div[1]/a"));

    foreach (var vacancy in listOfVacancies)
    {
        vacancies.Add(vacancy.FindElement(By.XPath(".//h2")).Text);
    }

    return vacancies;
}

The items are in a <ul> HTML tag and have <li> childs, which I am going through one by one, and get their inner text. The stale element error occurs in the foreach loop. I'm assuming that the web driver didn't have the time to reload the DOM, because it's working when breakpointing. However, I do have a method to wait until the page is fully loaded, which is what I use when going to the next page.

private bool TryClickingNextButton(string nextButtonXPath)
{
    var nextButton = Driver.FindElement(By.XPath(nextButtonXPath));

    var currentUrl = Driver.Url;
    ScrollElementIntoView(nextButton);
    nextButton.Click();
    WaitUntilLoaded();
    var newUrl = Driver.Url;

    return !currentUrl.Equals(newUrl);
}

I am comparing new and old URL to determine if this was the last page. The WaitUntilLoaded method looks like this:

var wait = new WebDriverWait(Driver, TimeSpan.FromSeconds(30));
wait.Until(x => ((IJavaScriptExecutor) Driver).ExecuteScript("return document.readyState").Equals("complete"));

Oddly enough, sometimes the web driver just closes immediately after loading the first page, without any errors nor any results. I spent a lot of time debugging and searching on SO, but can't seem to find any information, because the code is working perfectly fine when breakpointing through it.

I have only tried Chrome, with and without headless mode, but I don't see that this could be a Chrome problem.

The "Next" button has the following HTML:

<a href="" data-jn-click="nextPage()" data-ng-class="{'disabled-element':currentPage === totalPages}" tabindex="0">
    <span class="hidden-md hidden-sm hidden-xs">Next <span class="icon icon-pagination-single-forward"></span></span>
    <span class="hidden-lg icon icon-pagination-forward-enable"></span>
</a>

I couldn't find out what data-jn-click is. I tried to just execute the JavaScript nextPage();, but that didn't do anything.

Johannes Mols
  • 890
  • 1
  • 12
  • 35
  • I sense that the webpage is dynamically updated when you are trying to find the next (,next...) `.//h2`'s `Text`. Or, the page is not completely loaded when you started to find the first text. I would recommend to give certain wait time and use `FindElements` instead of multiple `FindElement`s and add each element found to `vacancies' List, which should give you better performance and reliability. – kurakura88 Apr 23 '18 at 01:34
  • I think Selenium has the ability to wait on a particular class to be loaded which helps your testing to be more robust. Check this link https://seleniumhq.github.io/selenium/docs/api/dotnet/html/T_OpenQA_Selenium_Support_UI_WebDriverWait.htm – sjmarsh Apr 23 '18 at 04:41
  • @kurakura88 I think that you're right, I edited my question and added the button/link HTML. I couldn't find out what "data-jn-click" exactly is. I will try to implement something to wait for a second or two and see what it does. – Johannes Mols Apr 23 '18 at 05:48
  • @sjmarsh I tried to wait with the JavaScript method in my question, but I will also try to wait for the elements I need and see if that works! Will let you know. – Johannes Mols Apr 23 '18 at 05:50
  • Do you have to change the state of the page at all to target the elements that become stale? If you have to turn a page or click any link/button, then the elements that were previously not displayed will become stale because their css will have changed – st0ve Apr 23 '18 at 06:24
  • Yes, the stale element exception happens when I click on a button that takes me to the next page, which is exactly the same, just with different elements. When I get the elements in FindVacanciesOnPage(), I'm getting a new instance of the elements by searching for them again, which is why I don't understand the stale element error. I'm not reusing items. – Johannes Mols Apr 23 '18 at 06:32

3 Answers3

0

I don't have any experience in c#, so if am wrong please don't mind. You are using findElementsand storing it to var listOfVacancies. I have referred some sites. Why don't you use ReadOnlyCollection<IWebElement>. It is better to store all elements as a List and iterate through it. So the code becomes,

ReadOnlyCollection<IWebElement> listOfVacancies = tableContainingAllVacancies.FindElements(By.XPath(".//li/article/div[1]/a"));
kripindas
  • 480
  • 2
  • 7
  • 21
  • FindElements actually returns a list of IWebElement's. Using "var" in C# just improves readibilty if it is clear which type the variable has. Thanks for the answer though! – Johannes Mols Apr 23 '18 at 05:51
  • @JohannesMols..Please check the below link. May be some reference will help you fix the issue. [Refer this link](https://stackoverflow.com/questions/45002008/selenium-stale-element-reference-element-is-not-attached-to-the-page) – kripindas Apr 23 '18 at 06:29
0

If the elements that are going into listOfVacancies are being populated via an ajax call, then document.readystate won't catch that. Try using:

wait.Until(x => ((IJavaScriptExecutor) Driver).ExecuteScript("return jQuery.active").Equals("0"));
st0ve
  • 519
  • 3
  • 18
  • Thank you for the answer. Unfortunately, this event doesn't get fired and the WebDriverWait will timeout, so I'm guessing the website isn't using Ajax. – Johannes Mols Apr 23 '18 at 06:08
0

I finally found a way to solve this issue. It's dirty, but it works. I tried many different approaches to waiting until the page is fully loaded, but none worked. So I went down the dark path of Thread.Sleep, but it's not as bad as it sounds like:

private IEnumerable<string> FindVacanciesOnPage()
{
    return FindVacanciesOnPage(new List<string>(), 0, 50, 15000);
}

private IEnumerable<string> FindVacanciesOnPage(ICollection<string> foundVacancies, long waitedTime, int interval, long maxWaitTime)
{
    try
    {
        var list = Driver.FindElements(By.XPath("//*[@data-ng-bind=\"item.JobHeadline\"]"));
        foreach (var vacancy in list)
        {
            foundVacancies.Add(vacancy.Text);
        }
    }
    catch (Exception)
    {
        if (waitedTime >= maxWaitTime) throw;

        Thread.Sleep(interval);
        waitedTime += interval;

        return FindVacanciesOnPage(foundVacancies, waitedTime, interval, maxWaitTime);

    }

    return foundVacancies;
}

This will try to get the items, and if there is an Exception thrown, just waits a certain amount of time until it tries again. When a specified maximum time was waited, the exception is finally thrown.

Johannes Mols
  • 890
  • 1
  • 12
  • 35