50

I am writing a selenium script by python, but I think I don't see any information about:

How to get http status code from selenium Python code.

Or I missing something. If anyone found that, please feel free to post.

BBdev
  • 4,898
  • 2
  • 31
  • 45
maa_modd
  • 521
  • 1
  • 4
  • 5

13 Answers13

52

It's Not Possible.

Unfortunately, Selenium does not provide this information by design. There is a very lengthy discussion about this, but the short of it is that:

  1. Selenium is a browser emulation tool, not necessarily a testing tool.
  2. Selenium performs many GETs and POSTs during the process of rendering a page and adding an interface for that would complicate the API in ways the authors resist.

We're left with hacks like:

  1. Look for error information in the returned HTML.
  2. Use another tool instead like Requests (but see the shortcomings of that approach in @Zeinab's answer.
mlissner
  • 17,359
  • 18
  • 106
  • 169
  • The only actual answer to the question asked. Thanks! – Xonshiz Dec 18 '19 at 10:50
  • 2
    Your answer is wrong. [Stefan Matei's answer](https://stackoverflow.com/a/39991889) and [Jarad's answer](https://stackoverflow.com/a/63876668) get the status code. – Peilonrayz Apr 22 '21 at 09:57
  • Somehow agree with "by design" comment. It's common multiple requests were triggered by script in the browser given with the initial request. It's not clear which response status code it's referring to unless the browser keep the first status code – Synru Apr 03 '22 at 10:25
16

I do not have much experience with python. I have a more detailed java example here:

https://stackoverflow.com/a/39979509/5703420

The idea is to enable Performance logging. This is triggering "Network.enable" on chromedriver. Then get the Performance log entries and parse them for "Network.responseReceived" message.

    from selenium import webdriver

    from selenium.webdriver.common.desired_capabilities import DesiredCapabilities    
    # enable browser logging
    d = DesiredCapabilities.CHROME
    d['loggingPrefs'] = { 'performance':'ALL' }

    driver = webdriver.Chrome(executable_path="c:\\windows\\chromedriver.exe", service_args=["--verbose", "--log-path=D:\\temp3\\chromedriverxx.log"], desired_capabilities=d)

    driver.get('https://api.ipify.org/?format=text')

    print(driver.title)

    print(driver.page_source)

    performance_log = driver.get_log('performance')
    print (str(performance_log).strip('[]'))

    for entry in driver.get_log('performance'):
        print (entry)

The output will contain "Network.responseReceived" for your url, other requests that are done by the page load, or redirect urls. All you have to do is parse the log entries.

'{"message":{"method":"Network.responseReceived","params":{"frameId":"9488.1","loaderId":"9488.1","requestId":"9488.1","response":{"connectionId":14,"connectionReused":false,"encodedDataLength":-1,"fromDiskCache":false,"fromServiceWorker":false,"headers":{"Connection":"keep-alive","Content-Length":"13","Content-Type":"text/plain","Date":"Wed, 12 Oct 2016 06:15:47 GMT","Server":"Cowboy","Via":"1.1 vegur"},"headersText":"HTTP/1.1 200 OK\\r\\nServer: Cowboy\\r\\nConnection: keep-alive\\r\\nContent-Type: text/plain\\r\\nDate: Wed, 12 Oct 2016 06:15:47 GMT\\r\\nContent-Length:13\\r\\nVia:1.1vegur\\r\\n\\r\\n","mimeType":"text/plain","protocol":"http/1.1","remoteIPAddress":"54.197.246.207","remotePort":443,"requestHeaders":{"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8","Accept-Encoding":"gzip, deflate, sdch, br","Accept-Language":"en-GB,en-US;q=0.8,en;q=0.6","Connection":"keep-alive","Host":"api.ipify.org","Upgrade-Insecure-Requests":"1","User-Agent":"Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36"},"requestHeadersText":"GET /?format=text HTTP/1.1\\r\\nHost: api.ipify.org\\r\\nConnection: keep-alive\\r\\nUpgrade-Insecure-Requests: 1\\r\\nUser-Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36\\r\\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\\r\\nAccept-Encoding: gzip, deflate, sdch, br\\r\\nAccept-Language: en-GB,en-US;q=0.8,en;q=0.6\\r\\n\\r\\n","securityDetails":{"certificateId":1,"certificateValidationDetails":{"numInvalidScts":0,"numUnknownScts":0,"numValidScts":0},"cipher":"AES_128_GCM","keyExchange":"ECDHE_RSA","protocol":"TLS 1.2","signedCertificateTimestampList":[]},"securityState":"secure","status":200,"statusText":"OK","timing":{"connectEnd":320.508999997401,"connectStart":3.08100000256673,"dnsEnd":3.08100000256673,"dnsStart":0,"proxyEnd":-1,"proxyStart":-1,"pushEnd":0,"pushStart":0,"receiveHeadersEnd":465.725000001839,"requestTime":78246.775045,"sendEnd":320.995999994921,"sendStart":320.825999995577,"sslEnd":320.435000001453,"sslStart":141.675999999279,"workerReady":-1,"workerStart":-1},"url":"https://api.ipify.org/?format=text"},"timestamp":78247.242716,"type":"Document"}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 1476252948094, 'level': 'INFO', 'message': '{"message":{"method":"Network.dataReceived","params":{"dataLength":13,"encodedDataLength":171,"requestId":"9488.1","timestamp":78247.243137}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 1476252948094, 'level': 'INFO', 'message': '{"message":{"method":"Page.frameNavigated","params":{"frame":{"id":"9488.1","loaderId":"9488.1","mimeType":"text/plain","securityOrigin":"https://api.ipify.org","url":"https://api.ipify.org/?format=text"}}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 1476252948095, 'level': 'INFO', 'message': '{"message":{"method":"Network.loadingFinished","params":{"encodedDataLength":171,"requestId":"9488.1","timestamp":78247.242066}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 1476252948115, 'level': 'INFO', 'message': '{"message":{"method":"Page.loadEventFired","params":{"timestamp":78247.264169}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 1476252948115, 'level': 'INFO', 'message': '{"message":{"method":"Page.frameStoppedLoading","params":{"frameId":"9488.1"}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 147625298116, 'level': 'INFO', 'message': '{"message":{"method":"Page.domContentEventFired","params":{"timestamp":78247.276475}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 1476252948122, 'level': 'INFO', 'message': '{"message":{"method":"Network.requestWillBeSent","params":{"documentURL":"https://api.ipify.org/?format=text","frameId":"9488.1","initiator":{"type":"other"},"loaderId":"9488.1","request":{"headers":{"Referer":"https://api.ipify.org/?format=text","User-Agent":"Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36"},"initialPriority":"High","method":"GET","mixedContentType":"none","url":"https://api.ipify.org/favicon.ico"},"requestId":"9488.2","timestamp":78247.280131,"type":"Other","wallTime":1476252948.11805}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}

and get "status":200 from the json response. You can also parse the response "headers".

Community
  • 1
  • 1
Stefan Matei
  • 1,076
  • 11
  • 10
  • get error on mac: `selenium.common.exceptions.WebDriverException: Message: POST /session/4fd2b36a-6c9a-e34d-8e9a-022424c7f36f/log did not match a known command` – user305883 Oct 17 '18 at 19:38
  • 1
    @user305883 It works only for Chrome. Usually this error is thrown when using other browsers (like Firefox). For Firefox you have to dump a log file and then parse it [java example](https://stackoverflow.com/questions/6509628/how-to-get-http-response-code-using-selenium-webdriver/39979509#39979509) – Stefan Matei Oct 18 '18 at 11:05
  • This does not work in Chrome todavy, at least with perl. Says this is is not a W3C command. – nck Oct 01 '20 at 12:32
10
import json
from selenium.webdriver.chrome.webdriver import WebDriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

chromedriver_path = "YOUR/PATH/TO/chromedriver.exe"
url = "https://selenium-python.readthedocs.io/api.html"
capabilities = DesiredCapabilities.CHROME.copy()
capabilities['goog:loggingPrefs'] = {'performance': 'ALL'}

browser = WebDriver(chromedriver_path, desired_capabilities=capabilities)

browser.get(url)
logs = browser.get_log('performance')

Option 1: if you just want to return the status code under the assumption that the page you want the status code from... exists in the log containing 'text/html content type

def get_status(logs):
    for log in logs:
        if log['message']:
            d = json.loads(log['message'])
            try:
                content_type = 'text/html' in d['message']['params']['response']['headers']['content-type']
                response_received = d['message']['method'] == 'Network.responseReceived'
                if content_type and response_received:
                    return d['message']['params']['response']['status']
            except:
                pass

Usage:

>>> get_status(logs)
200

Option 2: if you wanted to see all status codes in the relevant logs

def get_status_codes(logs):
    statuses = []
    for log in logs:
        if log['message']:
            d = json.loads(log['message'])
            if d['message'].get('method') == "Network.responseReceived":
                statuses.append(d['message']['params']['response']['status'])
    return statuses

Usage:

>>> get_status_codes(logs)
[200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200]

Note 1: much of this is based on @Stefan Matei answer, however, a few things have changed between Chrome versions and I provide an idea of how to parse the logs.

Note 2: ['content-type'] Not fully reliable. Casing can change. Inspect for your use-case.

Jarad
  • 17,409
  • 19
  • 95
  • 154
4

I will refer you to a question I asked earlier: How to detect when Selenium loads a browser's error page

The short of it is that unless you want to get uber fancy with something like a squid proxy or browsermob, then you have to go for a dirty solution like below.

Replace

driver.get( "http://google.com" )

with

def goTo( url ):
    if "errorPageContainer" in [ elem.get_attribute("id") for elem in driver.find_elements_by_css_selector("body > div") ]:
        raise Exception( "this page is an error" )
    else:
        driver.get( url )

You can get creative and get the error code based on the text displayed in the actual browser. This will have to be customized based on the browser; the one above works for firefox.

The only way this becomes problematic is with 404's (page not found), since many sites have their own error pages and you have to customize it for each one.

Community
  • 1
  • 1
sam-6174
  • 3,104
  • 1
  • 33
  • 34
4

It seems to be possible to get response status code from the log via API.

from selenium import webdriver
import json
browser = webdriver.PhantomJS()
browser.get('http://www.google.fr')
har = json.loads(browser.get_log('har')[0]['message'])
har['log']['entries'][0]['response']['status']
har['log']['entries'][0]['response']['statusText']
Mma
  • 41
  • 2
  • Is there anything browser specific about "the log" or can that code work on all browsers? – Ywapom Jan 24 '18 at 16:22
  • 1
    I tested it only with PhantomJS. I don't know about IE, but I think it should be possible with Chrome. – Mma Feb 05 '18 at 09:09
  • 4
    I received `selenium.common.exceptions.WebDriverException: Message: unknown error: log type 'har' not found` – etayluz Dec 13 '18 at 17:59
  • @etayluz That means `har` is not defined by you. You can do `capabilities['loggingPrefs'] = {'har': 'ALL'}` to add it. :-) – atb00ker Feb 18 '20 at 18:41
  • 1
    @atb00ker how and where in the code would you fit ```capabilities['loggingPrefs'] = {'har': 'ALL'}```? – Marcin Kulik Dec 11 '20 at 10:08
  • @MarcinKulik do `from selenium.webdriver.common.desired_capabilities import DesiredCapabilities` and you can create a variable `capabilities = DesiredCapabilities.FIREFOX` now you can use this to modify the capabilities of the browser. Hope it helps. This may serve as an example: https://github.com/openwisp/docker-openwisp/blob/3cfe866459d146c0b56ffee7abd500962b21442c/tests/runtests.py – atb00ker Dec 11 '20 at 10:46
  • For the Chrome driver, I added in: `capabilities = DesiredCapabilities.CHROME` `capabilities['goog:loggingPrefs'] = {'har': 'ALL'}` `driver = webdriver.Chrome('./chromedriver', options=options, desired_capabilities=capabilities)` But I still get a `log type 'har' not found` error. – RazorCallahan24 Feb 26 '21 at 23:33
3

In order to get a status code from url using Selenium you can use a javascript and XMLHttpRequest object. WebDriver class has a execute_async_script() method and you can call it to execute a javascript code within the browser:

from selenium import webdriver

driver = webdriver.Chrome(executable_path="C:\ChromeDriver\chromedriver.exe")
driver.get('https://stackoverflow.com/')

js = '''
let callback = arguments[0];
let xhr = new XMLHttpRequest();
xhr.open('GET', 'https://stackoverflow.com/', true);
xhr.onload = function () {
    if (this.readyState === 4) {
        callback(this.status);
    }
};
xhr.onerror = function () {
    callback('error');
};
xhr.send(null);
'''

status_code = driver.execute_async_script(js)
print(status_code)    # 200

driver.close()

More information about execute_async_script method.

  • It seems good for GET method. But is there any way to check the response code for a form submit in a page which use POST method? – bhattraideb Apr 28 '21 at 15:07
2

You can also inspect the last message in the log for an error status code: print browser.get_log('browser')[-1]['message']

etayluz
  • 15,920
  • 23
  • 106
  • 151
1

Don't ever say anything isn't possible. The top-voted answer is horrible. There are many other answers that lead to possible solutions, but I will share how I personally implemented this, which is based off of another Stack Overflow answer.

Tested using Google Chrome. The specifics for Firefox or PhantomJS may be a bit different.

I created a method for checking the response status code for any URL that you have visited. I'm sure that it could possibly cleaned up, but it works:

import json
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

capabilities = DesiredCapabilities.CHROME
capabilities['goog:loggingPrefs'] = {'performance': 'ALL'}

driver = webdriver.Chrome(desired_capabilities=capabilities)


def get_status_code(url):
    for entry in driver.get_log('performance'):
        for k, v in entry.items():
            if k == 'message' and 'status' in v:
                msg = json.loads(v)['message']['params']
                for mk, mv in msg.items():
                    if mk == 'response':
                        response_url = mv['url']
                        response_status = mv['status']
                        if response_url == url:
                            return response_status


print(get_status_code(driver.current_url))

Output:

200

rubynorails
  • 597
  • 1
  • 6
  • 17
1

In the meantime, there is a library in python called selenium-wire

pip install selenium-wire

It will let you do this for example:

from seleniumwire import webdriver
from selenium.webdriver.chrome.options import Options

url = request.POST.get('https://stackoverflow.com', None)
driver = webdriver.Chrome()
driver.get(url)

for request in driver.requests:
    if request.response:
        print(
            request.url,
            request.response.status_code,
            request.response.headers['Content-Type']
        )
Ilir
  • 430
  • 2
  • 7
  • 19
0

I've been surfing the net for about 3 hours and I found not a single way to do that with web-driver. I'v not ever worked with selenium directly. The only suggestion that came in my mind is to use module "requests" like this:

import requests
from selenium import webdriver

driver = webdriver.get("url")
r = requests.get("url")
print r.status_code

Complete tutorial about using requests is here and you can install the module using the command pip install requests.

But there is a problem that may not always happen, but you should focus that driver's response and request's response are not the same; so you just get the request's status code and if the url responses are not stable it probably causes wrong results.

Chankey Pathak
  • 21,187
  • 12
  • 85
  • 133
Zeinab Abbasimazar
  • 9,835
  • 23
  • 82
  • 131
  • 7
    So problematic. On top of all those problems, you GET the request twice, and further, this can't be used if you plan on having Selenium GET or POST urls. – mlissner Aug 06 '14 at 14:02
  • 2
    Thanks for nice answer. https://pypi.python.org/pypi/selenium-requests/ also does the same stuff. – Nafis Ahmad Dec 05 '15 at 06:08
  • 1
    Unfortunately, some websites will give different response codes to these different modules. Something to be aware of (try user-agent spoofing if you're relying on requests for status codes used in Selenium) – Alex P. Miller Jun 27 '16 at 23:05
  • Actually, it's a good idea to use an HTTP request and determine whether the URL is valid or not. – Afshin Mehrabani Dec 18 '16 at 19:53
  • When a page blocks direct requests and only allows loading the page via a browser, you get 403, so not helpful in this case – Peter Dec 22 '20 at 12:03
0

I'm using java here as I haven't got much experience in Python. Also, I don't know how to get only the http status codes. Following will give you the entire network traffic, you can capture status codes from it.

First start your server as

selenium.start("captureNetworkTraffic=true");

Then capture your trafic as

String traffic = selenium.captureNetworkTraffic("xml");

You can get output in json as well.

9ikhan
  • 1,177
  • 3
  • 11
  • 22
-1

YOU CAN GET STATUS CODE FROM THE TITLE

For example, 403 Forbidden response from nginx.

<html>
    <head>
        <title>403 Forbidden</title>
    </head>
    <body></body>
</html>

Selenium code:

text = driver.find_element_by_tag_name('title').text
if '403 Forbidden' in text:
    print('[INFO] status code is 403')

Ofcourse, this decision does not cover all the cases.

Elefanobi
  • 19
  • 1
  • 5
-2

I used the following trick by using requests to make sure that server is responding first. Then I used driver:

resp = requests.get(link)
while resp.status_code != 200:
    resp = requests.get(link)
    if resp.status_code == 200:
        break

html = driver.page_source

soup = BeautifulSoup(html)
Yasin
  • 7
  • 1