7

I want to listen to the Network events (basically all of the activity that you can see when you go to the Network tab on Chrome's Developer Tools / Inspect) and record specific events when a page is loaded via Python.

Is this possible? Thanks!

Specifically:

  • go to webpage.com
  • open Chrome Dev Tools and go to the Network tab
  • add api.webpage.com as a filter
  • refresh page [scroll]

I want to be able to capture the names of these events because there are specific IDs that aren't available via the UI.

Zach
  • 1,243
  • 5
  • 19
  • 28
  • 1
    What information are you trying to record? – guest271314 Feb 01 '19 at 14:33
  • @guest271314 XHR events -- the webpage calls their internal API and I need to capture specific IDs – Zach Feb 01 '19 at 14:36
  • You can use `PerformanceObserver`, see [Detect ajax requests from raw HTML](https://stackoverflow.com/questions/45406906/detect-ajax-requests-from-raw-html/45407041#45407041) – guest271314 Feb 01 '19 at 14:39
  • How is scrolling related to recording network requests? What do you mean by _"the names of these events"_? – guest271314 Feb 01 '19 at 14:44
  • @guest271314 it's not directly related, but the site uses an infinite scroll so more events are populated when you scroll -- you don't have to do it, but you can see more when you do – Zach Feb 01 '19 at 14:46
  • Do the linked answers not resolve the question? How is Python related to the question? – guest271314 Feb 01 '19 at 14:47
  • @guest271314 I'm not sure tbh... I can try it but I'm much less familiar with JS (was hoping there was a way to get this info from the `requests` or `urllib3` library in Python) so it will take some time :) – Zach Feb 01 '19 at 14:52
  • Have you tried using `PerformanceObserver`? – guest271314 Feb 01 '19 at 14:54
  • 1
    @Zach: If you're *simulating* a web browser loading the page (e.g. by sending HTTP requests directly using the `requests` Python module), then you need to also simulate the web browser executing the web application code to get any interactive behavior to happen. The usual way to do this is to use a headless web browser (e.g. using Selenium). – Daniel Pryden Feb 01 '19 at 15:10
  • @DanielPryden ya that's what I figured... I just don't know how to do that haha and was hoping someone could point me to the documentation on how I might. I tried looking for it but I don't think I was searching for the right thing :/ And btw, I'm actually using Scrapy so if you know of anything build for that framework then it would be preferred to using Selenium inside of Scrapy. – Zach Feb 01 '19 at 20:24

3 Answers3

5

Update 2021 I had to make few changes to Zach answer to make it work. Comments with ### are my comments

def get_perf_log_on_load(url, headless=True, filter=None):

    # init Chrome driver (Selenium)
    options = Options()
    options.add_experimental_option('w3c', False) ### added this line
    options.headless = headless
    cap = DesiredCapabilities.CHROME
    cap["loggingPrefs"] = {"performance": "ALL"}
    ### installed chromedriver.exe and identify path
    driver = webdriver.Chrome(r"C:\Users\asiddiqui\Downloads\chromedriver_win32\chromedriver.exe", desired_capabilities=cap, options=options) ### installed
    # record and parse performance log
    driver.get(url)
    if filter:
        log = [item for item in driver.get_log("performance") if filter in str(item)]
    else:
        log = driver.get_log("performance")
    driver.close()

    return log
0dminnimda
  • 1,242
  • 3
  • 12
  • 29
ar-siddiqui
  • 161
  • 1
  • 8
  • 2
    If you're getting an error like 'performance' log not found. Try this: https://stackoverflow.com/questions/53049026/selenium-chrome-performance-logs-not-working The change is basically to use: cap["goog:loggingPrefs"] – jeffsdata Jul 20 '21 at 16:30
3

Although it didn't completely answer the question, @mihai-andrei's answer got me the closest.

If anyone is looking for a Python solution than the following code should do the trick:

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.chrome.options import Options

def get_perf_log_on_load(self, url, headless = True, filter = None):

    # init Chrome driver (Selenium)
    options = Options()
    options.headless = headless
    cap = DesiredCapabilities.CHROME
    cap['loggingPrefs'] = {'performance': 'ALL'}
    driver = webdriver.Chrome(desired_capabilities = cap, options = options)

    # record and parse performance log
    driver.get(url)
    if filter: log = [item for item in driver.get_log('performance')
                      if filter in str(item)]
    else: log = driver.get_log('performance')
    driver.close()

    return log
Zach
  • 1,243
  • 5
  • 19
  • 28
  • This worked for me. Logging the performance was the key. BTW Selenium 4.0 will officially support accessing Chrome devtool infos but it's still in beta state. I haven't tried, but you can also use 4.0 for the same purpose. – user8491363 Jul 13 '21 at 13:24
1

You could side step chrome and use a scriptable proxy like mitmproxy. https://mitmproxy.org/

Another ideea is to use selenium to drive the browser and get the events from perf logs https://sites.google.com/a/chromium.org/chromedriver/logging/performance-log

Mihai Andrei
  • 1,024
  • 8
  • 11