Web scraping ignore "Next" or ">" when hidden (Selenium, Python)

Question

I am using Selenium for Python to scrape a site with multiple pages. To get to the next page, I use driver.find_element(By.XPATH, xpath). However, The xpath text changes. So, instead, I want to use other attributes.

I tried to find by class, using "page-link": driver.find_element(By.CLASS_NAME, "page-link". However, the "page-link" class is also present in the disabled list item. As a result, the Selenium driver won't stop after the last page, in this case page 2.

I want to stop the driver clicking the disabled item on the page, i.e. I want it to ignore the last item in the list, the one with "page-item disabled", aria-disabled="true" and aria-hidden="true". The idea is that if the script can't find that item, it will end a while loop that relies on the ">" button to be enabled.

See the source code below.

Please advise.

<nav>
<ul class="pagination">
<li class="page-item">
    <a class="page-link" href="https://www.blucap.net/app/FlightsReport?fromdate=2023-02-01&amp;todate=2023-02-28&amp;filterByMemberId=&amp;view=View%20Report&amp;page=1" rel="prev" aria-label="&laquo; Previous">&lsaquo;</a>
</li>
<li class="page-item">
    <a class="page-link" href="https://www.blucap.net/app/FlightsReport?fromdate=2023-02-01&amp;todate=2023-02-28&amp;filterByMemberId=&amp;view=View%20Report&amp;page=1">1</a>
</li>
<li class="page-item active" aria-current="page">
    <span class="page-link">2</span>
</li>
<li class="page-item disabled" aria-disabled="true" aria-label="Next &raquo;">
    <span class="page-link" aria-hidden="true">&rsaquo;</span>
</li>
</ul>
</nav>

score 0 · Answer 1 · answered Feb 19 '23 at 20:34

To go to the Next Page there can be a couple of approaches:

You can opt to find_element() and click it's descendant <span> of the <li> with aria-label="Next »" but doesn't contains aria-disabled="true" as follows:
```
driver.find_element(By.XPATH, "//li[starts-with(@aria-label, 'Next') and not(@aria-disabled='true')]/span").click()
```

score 0 · Answer 2 · answered Feb 20 '23 at 18:56

0

Alas, the solution offered did not work.

In the end I decided to rely on the http links and, using regex, extract the pages with page larger than 1 (as the first page is the starting page). Works like a charm.

Thanks for any effort, much appreciated.

answered Feb 20 '23 at 18:56

Martien Lubberink

2,614
1
19
31

Web scraping ignore "Next" or ">" when hidden (Selenium, Python)

2 Answers2