I am trying to scrape a page with Selenium in C# which has several pages that I can go through by clicking a "Next" button on the page. I am usually getting the error that there is a stale element reference, which ONLY happens if I run it without breakpoints. If I go through the program step by step, it works perfectly fine. I'm assuming that Selenium is skipping over important stuff without waiting (even though I have a wait method implemented).
To the code, this is the main logic for the problem:
foundVacancies.AddRange(FindVacanciesOnPage());
const string nextBtnXPath = "//*[@id=\"ContainerResultList\"]/div/div[3]/nav/ul/li[8]/a";
if (Driver.FindElements(By.XPath(nextBtnXPath)).Count != 0)
{
while (TryClickingNextButton(nextBtnXPath))
{
foundVacancies.AddRange(FindVacanciesOnPage());
}
}
This method first gets all items on the first page and adds them to the foundVacancies
list. After that, it will try to look for the "Next" button, which is not always there if there are not enough items. If it is, it will try to click it, scrape the page, and click it again until there are no pages left. This works great when debugging, but there is something very wrong with normally running.
The method for getting all items on the page, and where the error occurs:
private IEnumerable<string> FindVacanciesOnPage()
{
var vacancies = new List<string>();
var tableContainingAllVacancies = Driver.FindElement(By.XPath("//*[@id=\"ContainerResultList\"]/div/div[2]/div/ul"));
var listOfVacancies = tableContainingAllVacancies.FindElements(By.XPath(".//li/article/div[1]/a"));
foreach (var vacancy in listOfVacancies)
{
vacancies.Add(vacancy.FindElement(By.XPath(".//h2")).Text);
}
return vacancies;
}
The items are in a <ul>
HTML tag and have <li>
childs, which I am going through one by one, and get their inner text. The stale element error occurs in the foreach
loop. I'm assuming that the web driver didn't have the time to reload the DOM, because it's working when breakpointing. However, I do have a method to wait until the page is fully loaded, which is what I use when going to the next page.
private bool TryClickingNextButton(string nextButtonXPath)
{
var nextButton = Driver.FindElement(By.XPath(nextButtonXPath));
var currentUrl = Driver.Url;
ScrollElementIntoView(nextButton);
nextButton.Click();
WaitUntilLoaded();
var newUrl = Driver.Url;
return !currentUrl.Equals(newUrl);
}
I am comparing new and old URL to determine if this was the last page. The WaitUntilLoaded
method looks like this:
var wait = new WebDriverWait(Driver, TimeSpan.FromSeconds(30));
wait.Until(x => ((IJavaScriptExecutor) Driver).ExecuteScript("return document.readyState").Equals("complete"));
Oddly enough, sometimes the web driver just closes immediately after loading the first page, without any errors nor any results. I spent a lot of time debugging and searching on SO, but can't seem to find any information, because the code is working perfectly fine when breakpointing through it.
I have only tried Chrome, with and without headless mode, but I don't see that this could be a Chrome problem.
The "Next" button has the following HTML:
<a href="" data-jn-click="nextPage()" data-ng-class="{'disabled-element':currentPage === totalPages}" tabindex="0">
<span class="hidden-md hidden-sm hidden-xs">Next <span class="icon icon-pagination-single-forward"></span></span>
<span class="hidden-lg icon icon-pagination-forward-enable"></span>
</a>
I couldn't find out what data-jn-click
is. I tried to just execute the JavaScript nextPage();
, but that didn't do anything.