Python Selenium Scraper: Pagination to next page shows error. Scrap Protection from Website?

Question

I'm running a python selenium script in a lambda function on AWS.

I'm scraping this page: Link

The scraper itself is working fine. But the pagination to the next page stopped working. It worked before for many months.

I exported a screenshot via:

png = driver.get_screenshot_as_base64()

It shows this page instead of the second page:

I run this code (simplified version):

while url:
        driver.get(url)
        png = driver.get_screenshot_as_base64()
        print(png)
        button_next = driver.find_elements_by_class_name("PaginationArrowLink-sc-imp866-0")
        print("button_next_url: " + str(button_next[-1].get_attribute("href")))
        try:
            url = button_next[-1].get_attribute("href")
        except:
            url=""
            print('Error in URL')

The interesting thing is the printed URL is totally fine and when I open it manually in the browser it loads page 2:

https://www.stepstone.de/5/ergebnisliste.html?what=Berufskraftfahrer&searchorigin=Resultlist_top-search&suid=1faad076-5348-48d8-9834-4e0d9a836e34&of=25&action=paging_next

But "driver.get(url)" leads to the error page on the screenshot.

Is this some sort of scrape protection from the website? Or is there another reason it sopped working from one day to the other?

maybe you should write information to log file - which line was executed, what you have in variables, what error you get in except. maybe you create wrong url and it can't find it. OR maybe portal changes HTML and now it use different code and different urls and you have to change your code. Without information in log file there is no clue what can make problem - and we have no idea what is wrong. Simply: you have to debug code. — furas, Jun 18 '22 at 15:25
maybe you should `.click()` button instead of using `get(url)` — furas, Jun 18 '22 at 15:27
maybe it would be simpler to search by xpath like `driver.find_element_by_xpath('//a[@title="Nächste"]')` - with word `element` without `s` at the end - to get one result. OR `//a[@data-at="pagination-next"]` — furas, Jun 18 '22 at 15:31
@furas xpath lead me to this error: https://stackoverflow.com/questions/72657844/python-selenium-nosuchelementexception-although-the-value-was-found — Max, Jun 18 '22 at 15:58
I see you already got answer for this error. Sometimes python code runs faster then JavaScript in browser and it may need to wait until JavaScript adds element to page. — furas, Jun 18 '22 at 16:25

score 1 · Answer 1 · answered Jun 18 '22 at 16:01

The solution was to cut the last part of the URL.

from:

https://www.stepstone.de/5/ergebnisliste.html?what=berufskraftfahrer&searchorigin=Resultlist_top-search&of=25&action=paging_next

to:

https://www.stepstone.de/5/ergebnisliste.html?what=berufskraftfahrer&searchorigin=Resultlist_top-search&of=25

I still don't understand why Selenium was not able to load it, but manually it works. But now it is running again.

Python Selenium Scraper: Pagination to next page shows error. Scrap Protection from Website?

1 Answers1

Linked