11
https://www.sahibinden.com/en

If you open it incognito window and check headers in Fiddler then these are the two main headers you get: enter image description here

When I click the last one and check request headers this is what I get enter image description here

I want to get these headers in Python. Is there any way that I can get these using Selenium? Im a bit clueless here.

user3102085
  • 459
  • 3
  • 8
  • 19
  • https://stackoverflow.com/questions/58170965/how-to-use-requests-library-with-selenium-in-python this may be useful for you! – lam vu Nguyen Feb 11 '23 at 19:13

7 Answers7

26

You can use Selenium Wire. It is a Selenium extension which has been developed for this exact purpose.

https://pypi.org/project/selenium-wire/

An example after pip install:

##  Import webdriver from Selenium Wire instead of Selenium
from seleniumwire import webdriver

##  Get the URL
driver = webdriver.Chrome("my/path/to/driver", options=options)
driver.get("https://my.test.url.com")

##  Print request headers
for request in driver.requests:
  print(request.url) # <--------------- Request url
  print(request.headers) # <----------- Request headers
  print(request.response.headers) # <-- Response headers
Konemiees
  • 269
  • 3
  • 6
  • Any chance you might know if something similar exists for ruby? :) – 8bithero Oct 20 '20 at 17:53
  • 1
    Unfortunately I've only used selenium with Python :/ – Konemiees Oct 21 '20 at 13:00
  • 1
    what the above package is doing is setting up a mitm proxy on selenium you can do this manually using mitmproxy also you might need to install the certificate of the selenium browser in order for it to function properly – ahmed mani Dec 25 '21 at 02:45
11

You can run JS command like this;

var req = new XMLHttpRequest()
req.open('GET', document.location, false)
req.send(null)
return req.getAllResponseHeaders()

On Python;

driver.get("https://t.me/codeksiyon")
headers = driver.execute_script("var req = new XMLHttpRequest();req.open('GET', document.location, false);req.send(null);return req.getAllResponseHeaders()")

# type(headers) == str

headers = headers.splitlines()
raifpy
  • 147
  • 1
  • 4
7

The bottom line is, No, you can't retrieve the request headers using Selenium.


Details

It had been a long time demand from the Selenium users to add the WebDriver methods to read the HTTP status code and headers from a HTTP response. We have discussed about implementing this feature through Selenium at length within the discussion WebDriver lacks HTTP response header and status code methods.

However, Jason Leyba (Selenium contributor) in his comment straightly mentioned:

We will not be adding this feature to the WebDriver API as it falls outside of our current scope (emulating user actions).

Ashley Leyba further added, attempting to make WebDriver the ideal web testing tool will suffer in overall quality as driver.get(url) blocks until the browser has loaded the page and return the response for the final loaded page. So in case of a login redirects, status codes and headers will always end up with a 200 instead of the 302 you're looking for.

Finally, Simon M Stewart (WebDriver creator) in his comment concluded that:

This feature isn't going to happen. The recommended approach is to either extend the HtmlUnitDriver to access the information you require or to make use of an external proxy that exposes this information such as the BrowserMob Proxy

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
1
js_headers = '''
    const _xhr = new XMLHttpRequest();
    _xhr.open("HEAD", document.location, false);
    _xhr.send(null);

    const _headers = {};

    _xhr.getAllResponseHeaders().trim().split(/[\\r\\n]+/).map((value) => value.split(/: /)).forEach((keyValue) => {
        _headers[keyValue[0].trim()] = keyValue[1].trim();
    });

    return _headers;
'''

page_headers = driver.execute_script(js_headers)

type(page_headers) # -> dict
keyiflerolsun
  • 191
  • 1
  • 7
0

Maybe you can use BrowserMob Proxy for this. Here is a example:

import settings

from browsermobproxy import Server
from selenium.webdriver import DesiredCapabilities

config = settings.Config

server = Server(config.BROWSERMOB_PATH)
server.start()
proxy = server.create_proxy()

from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=%s' % proxy.proxy)
chrome_options.add_argument('--headless')

capabilities = DesiredCapabilities.CHROME.copy()
capabilities['acceptSslCerts'] = True
capabilities['acceptInsecureCerts'] = True

driver = webdriver.Chrome(options=chrome_options,
    desired_capabilities=capabilities,
   executable_path=config.CHROME_PATH)

proxy.new_har("sahibinden", options={'captureHeaders': True})
driver.get("https://www.sahibinden.com/en")

entries = proxy.har['log']["entries"]
for entry in entries:
    if 'request' in entry.keys():
        print(entry['request']['url'])
        print(entry['request']['headers'])
        print('\n')

proxy.close()
driver.quit()
Celso Jr
  • 136
  • 10
0

You can use https://pypi.org/project/selenium-wire/ a plug-in replacement for webdriver adding request/response manipulation even for https by using its own local ssl certificate.

from seleniumwire import webdriver
d = webdriver.Chrome() # make sure chrome/chromedriver is in path
d.get('https://en.wikipedia.org')
vars(d.requests[-1].headers)

will list the headers in the last requests object list:

{'policy': Compat32(), '_headers': [('content-length', '1361'), 
('content-type', 'application/json'), ('sec-fetch-site', 'none'), 
('sec-fetch-mode', 'no-cors'), ('sec-fetch-dest', 'empty'), 
('user-agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.102 Safari/537.36'), 
('accept-encoding', 'gzip, deflate, br')],
'_unixfrom': None, '_payload': None, '_charset': None,
'preamble': None, 'epilogue': None, 'defects': [], '_default_type': 'text/plain'}
MortenB
  • 2,749
  • 1
  • 31
  • 35
-1

It's not possible to get headers using Selenium. Further information

However, you might use other libraries such as requests, BeautifulSoup to get headers.

Baris
  • 397
  • 5
  • 12
  • These headers are shown only in icognito window when site is first visited. Then cookies are stored and this page is not visited. Would BS be able to capture those headers every time it is run? – user3102085 Jun 08 '20 at 12:47
  • Can you share any resources? – user3102085 Jun 08 '20 at 12:49
  • You can use requests to get HTML content and headers. It doesn't cache by default. (https://stackoverflow.com/questions/20198274/how-do-i-clear-cache-with-python-requests) Then, you can parse this HTML with BeautifulSoup if necessary. – Baris Jun 08 '20 at 13:12
  • I need request headers and not response headers – user3102085 Jun 08 '20 at 13:13
  • Can I intercept requests being made to url? – user3102085 Jun 08 '20 at 13:14