0

I have a chrome extension that is meant to intercept response body data, manipulate it into useful statistics and then render these stats on the page.

The problem is that the request interceptor loads after the most important requests have already been sent/received so is unable to scrape them (see my attached images from the chrome network tab hopefully that’ll make things clearer).

Currently, my interceptor runs in a content.js script that is matched to the urls I'm interested in.


INTERCEPTOR.JS

listenerFn = (event) => {
    console.log("Url:, ", event.target.responseURL);
    // DO OTHER STUFF
}

(
    () => {
        var XHR = XMLHttpRequest.prototype;
        var send = XHR.send;
        XHR.send = function() {
            this.addEventListener('load', listenerFn)
            return send.apply(this, arguments);
        };
    }
)();

My manifest (V3) script looks like this...


MANIFEST.JSON

{
    "manifest_version": 3,
    .....
    "background": {
        "service_worker": "background.js"
    },
    "content_scripts": [
        {
          "matches": ["URL_OF_INTEREST.com/*"],
          "js": ["interceptor.js"],
          "run_at": "document_start"
        }
    ],
    "web_accessible_resources": [
        {
            "matches": ["<all_urls>"],
            "resources": ["interceptor.js"]
        }
    ],
    "permissions": [
        "scripting",
        "activeTab",
        "declarativeContent",
        "storage",
        "tabs"
    ],
    "host_permissions": [
        "*://*/*"
      ]
}

I feel like my options are either:

  1. Work out how to inject the interceptor script faster
  2. Delay page requests until the interceptor has been fully injected (impossible?)
  3. Try using chrome.WebRequest somehow

I don't know if any of these are possible. 1 - I don't think the interceptor can be injected sooner in the current setup (as i think I've done all I can with setting run_at). 2 - I dont even know if this can be done. 3 - I believe WebRequest doesn't give access to request bodys.

Someone mentioned to me that as the code is not related to page content it may be possible to have this code run in the background.js script. So maybe this is a good avenue to explore.

I've attached two images below showing the network tab from chrome dev tools.

In the first image (which shows only XHRs), the green arrow is the request that I need to scrape, the purple bracket covers requests that haven’t been intercepted and the yellow ones are requests that have. The colours are similar in the second image (which shows both XHRs and JS files), but this includes a blue arrow showing when the interceptor.js file has been run.

Screenshot of Network (Only XHR) Screenshot of Network (XHR and JS files)

Any suggestions or guidance would be greatly appreciated. If anyone wants/requires any additional information just let me know,

Thanks!

wOxxOm
  • 65,848
  • 11
  • 132
  • 136
Maximilian
  • 143
  • 1
  • 8

1 Answers1

2

The problem is that your injector script is loaded via DOM script element's src, so it loads asynchronously and runs after the other scripts loaded by the page.

The solution is to register the injected script at document_start directly in the context of the page (aka MAIN world), there are two ways to do it shown below.

You won't need the content script that creates the script element.

1. "world" in manifest.json [Chrome 111+]

  "content_scripts": [{
    "matches": ["*://*.example.com/*"],
    "js": ["interceptor.js"],
    "run_at": "document_start",
    "world": "MAIN"
   }]

2. chrome.scripting.registerContentScripts [Chrome 102+]

  1. Remove web_accessible_resources and the code in the content script that loads interceptor.js

  2. Add permissions and host_permissions in manifest.json:

      "permissions": ["scripting"],
      "host_permissions": ["*://*.example.com/*"],
      "background": { "service_worker": "background.js" },
    
  3. Add the following code to your background.js:

    chrome.runtime.onInstalled.addListener(async () => {
      const scripts = [{
        id: 'interceptor',
        js: ['interceptor.js'],
        matches: ['*://*.example.com/*'],
        runAt: 'document_start',
        world: 'MAIN',
      }];
      const ids = scripts.map(s => s.id);
      await chrome.scripting.unregisterContentScripts({ids}).catch(() => {});
      await chrome.scripting.registerContentScripts(scripts).catch(() => {});
    });
    

P.S.

Another problem is that you load the same file in content_scripts and web_accessible_resources. You should use two different scripts because they run in two different contexts ("worlds"). To communicate between them you can use CustomEvent messaging (example).

And lastly, the site may be using iframes to make the requests, in which case you need to add "all_frames": true to your content script's declaration in manifest.json, and possibly "match_origin_as_fallback": true in case the iframe is about:blank or doesn't have any src.

wOxxOm
  • 65,848
  • 11
  • 132
  • 136
  • Is `unregisterContentScripts` mandatory? I mean, the extension has just been installed / re-installed. – frouo Sep 05 '22 at 06:45
  • IIRC it may not unregister in some cases as there's a bug in Chrome about it. – wOxxOm Sep 05 '22 at 09:03
  • You provide an answer but don't really explain why this solution should work, and I am confused. How is this any different than declaring the content script in the manifest? – X33 Jan 11 '23 at 23:45
  • 1
    @X33, not sure what you mean, it's already explained, but here's more: this solution is using an entirely different mechanism that executes the script in the MAIN world at the document_start timing correctly i.e. synchronously, not asynchronously. Currently we can't specify `"world":"MAIN"` in `content_scripts` in manifest.json, it's only implemented in Chrome 111. For more info about worlds, see [this answer](/a/9517879). – wOxxOm Jan 12 '23 at 07:42
  • Scripts don't even execute with your suggested method – X33 Jan 13 '23 at 12:53
  • They do. I guess you didn't add `scripting` in `permissions` or the site to the `host_permissions`. I'll add this to the answer. – wOxxOm Jan 13 '23 at 14:14
  • So this seems to only work when I change the specific URL to "" – X33 Jan 13 '23 at 15:50
  • You can use a specific URL. – wOxxOm Jan 13 '23 at 15:52