2

Used Selenium in python3 to open a page. It does not open under selenium but it does open under firefox private page.

What is the difference and how to fix it?

from selenium import webdriver
from time import sleep

driver = webdriver.Firefox()
driver.get('https://google.com') # creating a google cookie
driver.get_cookies() # check google gets cookies
sleep(3.0)
url='https://www.realestate.com.au/buy/in-sydney+cbd%2c+nsw/list-1'
driver.get(url)

Creating a google cookie is not necessary. It is not there under firefox private page either but it works without it. However, under Selenium the behavior is different.

I also see the website returns [HTTP/2 429 Too Many Requests 173ms] status and the page is blank white. It does not happen in firefox private mode.

UPDATE:

I turned on the persistent log. Firefox on private mode will receive a 429 response too but it seems the javascript will resume from another url. It only happens for the first time.

On selenium however, the request does not survive the 429 response. It does report something to cdndex website. I have blocked that website so you o not see the request go through there. This is still a different behavior between firefox and selenium.

Selenium with persistent log: Selenium network

Firefox with persistent log: Firefox network

vica
  • 93
  • 7
  • How are you getting this 429 error with your current code using Selenium? – Life is complex Dec 31 '21 at 17:55
  • @barej it's definitely some kind of protection on the website's side. Could you be more specific, what expected behavior do you desire? I'd say that you can just clear cookies before that request and use a proper `User-Agent`, but I'm not sure what exactly are you in need of. – Yevgeniy Kosmak Jan 01 '22 at 18:04
  • @YevgeniyKosmak, the behavior between Selenium and firefox is different. That's the point. Where is the difference coming from? – vica Jan 02 '22 at 20:56

2 Answers2

0

This is just my huch after working with selenium and webdriver for a while; I suspect that it is due to the default user agent of selenium being set to something lame by default and that the server side recognizes this and provides you with a silly HTTP code and a blank page as a result.

Try setting the user agent to something reasonable and/or disable selenium's interfering with defaults.

Another tips is to look at the request using wireshark or similar to see exactly what is sent over the wire.

Mr. Developerdude
  • 9,118
  • 10
  • 57
  • 95
  • In terms of user agent, they look the same. In terms of header, firefox has an extra line of `TE: trailers` . I do not think that python selenium allows you changing any header. – vica Dec 31 '21 at 12:11
  • I have also added an update. – vica Jan 01 '22 at 00:18
0

429 Too Many Requests

The HTTP 429 Too Many Requests response status code indicates the user has sent too many requests within a short period of time. The 429 status code is intended for use with rate-limiting schemes.


Root Cause

When your server detects that a user agent is trying to access a specific page too often in a short period of time, it triggers a rate-limiting feature. The most common example of this is when a user (or an attacker) repeatedly tries to log into a web application.

The server can also identify a with cookies, rather than by their login credentials. Requests may also be counted on a per-request basis, across your server, or across several servers. So there are a variety of situations that can result in you seeing an error like one of these:

  • 429 Too Many Requests
  • 429 Error
  • HTTP 429
  • Error 429 (Too Many Requests)

This usecase

This usecase seems to be a classical case of Selenium driven GeckoDriver initiated Browsing Context getting detected as a bot due to the fact:

Selenium identifies itself


References

You can find a couple of relevant detailed discussions in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352