Selenium gets response code of 429 but firefox private mode does not

Question

Used Selenium in python3 to open a page. It does not open under selenium but it does open under firefox private page.

What is the difference and how to fix it?

from selenium import webdriver
from time import sleep

driver = webdriver.Firefox()
driver.get('https://google.com') # creating a google cookie
driver.get_cookies() # check google gets cookies
sleep(3.0)
url='https://www.realestate.com.au/buy/in-sydney+cbd%2c+nsw/list-1'
driver.get(url)

Creating a google cookie is not necessary. It is not there under firefox private page either but it works without it. However, under Selenium the behavior is different.

I also see the website returns [HTTP/2 429 Too Many Requests 173ms] status and the page is blank white. It does not happen in firefox private mode.

UPDATE:

I turned on the persistent log. Firefox on private mode will receive a 429 response too but it seems the javascript will resume from another url. It only happens for the first time.

On selenium however, the request does not survive the 429 response. It does report something to cdndex website. I have blocked that website so you o not see the request go through there. This is still a different behavior between firefox and selenium.

Selenium with persistent log:

Firefox with persistent log:

How are you getting this 429 error with your current code using Selenium? — Life is complex, Dec 31 '21 at 17:55
@barej it's definitely some kind of protection on the website's side. Could you be more specific, what expected behavior do you desire? I'd say that you can just clear cookies before that request and use a proper `User-Agent`, but I'm not sure what exactly are you in need of. — Yevgeniy Kosmak, Jan 01 '22 at 18:04
@YevgeniyKosmak, the behavior between Selenium and firefox is different. That's the point. Where is the difference coming from? — vica, Jan 02 '22 at 20:56

score 0 · Answer 1 · answered Dec 30 '21 at 18:25

0

This is just my huch after working with selenium and webdriver for a while; I suspect that it is due to the default user agent of selenium being set to something lame by default and that the server side recognizes this and provides you with a silly HTTP code and a blank page as a result.

Try setting the user agent to something reasonable and/or disable selenium's interfering with defaults.

Another tips is to look at the request using wireshark or similar to see exactly what is sent over the wire.

answered Dec 30 '21 at 18:25

Mr. Developerdude

9,118
10
57
95

In terms of user agent, they look the same. In terms of header, firefox has an extra line of `TE: trailers` . I do not think that python selenium allows you changing any header. – vica Dec 31 '21 at 12:11
I have also added an update. – vica Jan 01 '22 at 00:18

score 0 · Answer 2 · answered Jan 02 '22 at 23:37

429 Too Many Requests

The HTTP 429 Too Many Requests response status code indicates the user has sent too many requests within a short period of time. The 429 status code is intended for use with rate-limiting schemes.

Root Cause

When your server detects that a user agent is trying to access a specific page too often in a short period of time, it triggers a rate-limiting feature. The most common example of this is when a user (or an attacker) repeatedly tries to log into a web application.

The server can also identify a bot with cookies, rather than by their login credentials. Requests may also be counted on a per-request basis, across your server, or across several servers. So there are a variety of situations that can result in you seeing an error like one of these:

429 Too Many Requests
429 Error
HTTP 429
Error 429 (Too Many Requests)

This usecase

This usecase seems to be a classical case of Selenium driven GeckoDriver initiated firefox Browsing Context getting detected as a bot due to the fact:

Selenium identifies itself

References

You can find a couple of relevant detailed discussions in:

Selenium gets response code of 429 but firefox private mode does not

2 Answers2

429 Too Many Requests

Root Cause

This usecase

References