0

I'm running a python selenium script in a lambda function on AWS.

I'm scraping this page: Link

The scraper itself is working fine. But the pagination to the next page stopped working. It worked before for many months.

I exported a screenshot via:

png = driver.get_screenshot_as_base64()

It shows this page instead of the second page: enter image description here

I run this code (simplified version):

while url:
        driver.get(url)
        png = driver.get_screenshot_as_base64()
        print(png)
        button_next = driver.find_elements_by_class_name("PaginationArrowLink-sc-imp866-0")
        print("button_next_url: " + str(button_next[-1].get_attribute("href")))
        try:
            url = button_next[-1].get_attribute("href")
        except:
            url=""
            print('Error in URL')

The interesting thing is the printed URL is totally fine and when I open it manually in the browser it loads page 2:

https://www.stepstone.de/5/ergebnisliste.html?what=Berufskraftfahrer&searchorigin=Resultlist_top-search&suid=1faad076-5348-48d8-9834-4e0d9a836e34&of=25&action=paging_next

But "driver.get(url)" leads to the error page on the screenshot.

Is this some sort of scrape protection from the website? Or is there another reason it sopped working from one day to the other?

Max
  • 33
  • 5
  • maybe you should write information to log file - which line was executed, what you have in variables, what error you get in except. maybe you create wrong url and it can't find it. OR maybe portal changes HTML and now it use different code and different urls and you have to change your code. Without information in log file there is no clue what can make problem - and we have no idea what is wrong. Simply: you have to debug code. – furas Jun 18 '22 at 15:25
  • maybe you should `.click()` button instead of using `get(url)` – furas Jun 18 '22 at 15:27
  • maybe it would be simpler to search by xpath like `driver.find_element_by_xpath('//a[@title="Nächste"]')` - with word `element` without `s` at the end - to get one result. OR `//a[@data-at="pagination-next"]` – furas Jun 18 '22 at 15:31
  • @furas xpath lead me to this error: https://stackoverflow.com/questions/72657844/python-selenium-nosuchelementexception-although-the-value-was-found – Max Jun 18 '22 at 15:58
  • I see you already got answer for this error. Sometimes python code runs faster then JavaScript in browser and it may need to wait until JavaScript adds element to page. – furas Jun 18 '22 at 16:25

1 Answers1

1

The solution was to cut the last part of the URL.

from:

https://www.stepstone.de/5/ergebnisliste.html?what=berufskraftfahrer&searchorigin=Resultlist_top-search&of=25&action=paging_next

to:

https://www.stepstone.de/5/ergebnisliste.html?what=berufskraftfahrer&searchorigin=Resultlist_top-search&of=25

I still don't understand why Selenium was not able to load it, but manually it works. But now it is running again.

Max
  • 33
  • 5