18

I am using python Django to create a web app. i am using selenium to launch a headless browser(phantomjs) and making some clicks till i reach a particular page. I wish to capture network traffic and get the response of a particular network call. This network call actually holds a html doc as it's response.

Any way to achieve this ?

Rich Rajah
  • 2,256
  • 1
  • 12
  • 14

1 Answers1

37

You can get access to browser or chromedriver logs, they are slightly different when it comes to network responses. The browser log is called performance and the driver log is called driver. They return a json-like object, which you can parse to extract events with Network methods inside them:

{'level': 'INFO',
  'message': '{"message":{"method":"Page.frameStoppedLoading","params":{"frameId":"FB10764A3ABF7FFC83110C39C5F7BF77"}},"webview":"C2D13BD13CF743B6D0695B35E9CC935C"}',
  'timestamp': 1538607113832},
 {'level': 'INFO',
  'message': '{"message":{"method":"Page.frameDetached","params":{"frameId":"FB10764A3ABF7FFC83110C39C5F7BF77"}},"webview":"C2D13BD13CF743B6D0695B35E9CC935C"}',
  'timestamp': 1538607113838},
 {'level': 'INFO',
  'message': '{"message":{"method":"Network.requestWillBeSent","params":{"documentURL":"https://stackoverflow.com/questions/52633697/selenium-python-how-to-capture-network-traffics-response","frameId":"C2D13BD13CF743B6D0695B35E9CC935C","hasUserGesture":false,"initiator":{"type":"other"},"loaderId":"5331BFDC4F466FCED920CFC9F033D2EC","request":{"headers":{"Upgrade-Insecure-Requests":"1","User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36"},"initialPriority":"VeryHigh","method":"GET","mixedContentType":"none","referrerPolicy":"no-referrer-when-downgrade","url":"https://stackoverflow.com/questions/52633697/selenium-python-how-to-capture-network-traffics-response"},"requestId":"5331BFDC4F466FCED920CFC9F033D2EC","timestamp":104499.729,"type":"Document","wallTime":1538607113.838206}},"webview":"C2D13BD13CF743B6D0695B35E9CC935C"}',
  'timestamp': 1538607113839},...}

You need to enable logging in DesiredCapabilities and then parse it using JSON module:

import json
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

caps = DesiredCapabilities.CHROME
caps['goog:loggingPrefs'] = {'performance': 'ALL'}
driver = webdriver.Chrome(desired_capabilities=caps)
driver.get('https://stackoverflow.com/questions/52633697/selenium-python-how-to-capture-network-traffics-response')

def process_browser_log_entry(entry):
    response = json.loads(entry['message'])['message']
    return response

browser_log = driver.get_log('performance') 
events = [process_browser_log_entry(entry) for entry in browser_log]
events = [event for event in events if 'Network.response' in event['method']]

I don't know if you can get access to response data itself using this, but you can get a url of the response.

Another option is to use a library like selenium-wire.

UPDATE 2020-10-07 ⬇

As @Roey B and @Inactivist explain in the comments, you can access response body using Network.getResponseBody command:

driver.execute_cdp_cmd('Network.getResponseBody', {'requestId': events[0]["params"]["requestId"]})
hellpanderr
  • 5,581
  • 3
  • 33
  • 43
  • 12
    Note: If you're having trouble getting the performance log to work on recent (~75+) chrome, see here: https://stackoverflow.com/a/56536604/5368039. Basically just change `loggingPrefs` to `goog:loggingPrefs` – Jeremy Weirich Aug 05 '19 at 19:18
  • 5
    to get the response data you can run this: `driver.execute_cdp_cmd('Network.getResponseBody', {'requestId': msg["message"]["params"]["requestId"]})` – Roey B May 26 '20 at 15:44
  • 1
    @RoeyB what goes in `msg`? – hellpanderr May 26 '20 at 17:35
  • @RoeyB, please help explain what does "msg" stand for here? Thanks! – johnanish Jun 16 '20 at 08:03
  • `msg` is one of the parsed JSON log entries returned by `driver.get_log('performance')` see also [selenium.webdriver.chrome.webdriver.execute_cdp_cmd](https://www.selenium.dev/selenium/docs/api/py/webdriver_chrome/selenium.webdriver.chrome.webdriver.html#selenium.webdriver.chrome.webdriver.WebDriver.execute_cdp_cmd) – Inactivist Jul 10 '20 at 01:53
  • @JeremyWeirich why don't you just edit the answer? – Smit Johnth Oct 30 '20 at 14:14
  • 9
    Here's a working example to extract JSON requests: https://gist.github.com/lorey/079c5e178c9c9d3c30ad87df7f70491d – Karl Lorey Nov 03 '20 at 17:05
  • Can somone help me on how to do this with firefox browser?, – shajahan Jun 01 '21 at 23:03
  • From Selenium 4.0 onward the library will support accessing devtools information. It's still in beta state though. – user8491363 Jul 13 '21 at 13:32