46

I'm using selenium python webdriver in order to browse some pages. I want to inject a javascript code in to a pages before any other Javascript codes get loaded and executed. On the other hand, I need my JS code to be executed as the first JS code of that page. Is there a way to do that by Selenium?

I googled it for a couple of hours, but I couldn't find any proper answer!

Alex
  • 1,914
  • 6
  • 26
  • 47
  • But my question is that how I can inject JS code using Selenium Webdriver before page load. I don't have access to the content of the those pages, so I can not inject a JS code in them unless I use a proxy to rewrite the page content. – Alex Jul 11 '15 at 08:54
  • 3
    I think, I have found the answer. According to http://grokbase.com/t/gg/selenium-users/12a99543jq/is-there-a-way-to-inject-javascripts-before-page-onload, We can not do that unless we use a proxy to inject a script at the beginning of the page. – Alex Jul 11 '15 at 15:52
  • 1
    Would you be able to install an application such as GreaseMonkey or Tampermonkey to inject your scripts? https://addons.mozilla.org/en-us/firefox/addon/greasemonkey/ – Brian Cain Nov 08 '15 at 20:17
  • Yap, you can do it by your own extension or GreaseMonkey. – Alex Nov 09 '15 at 01:04
  • If you are not using a physical display and using something like PhantomJS, you can get the DOM of the target page. Next, you can traverse the DOM, inject your script and add an `onLoad` trigger to execute the script on page load. This is one of the most straightforward ways as I see it happening. – Abhinav Nov 17 '15 at 15:38

5 Answers5

20

Selenium has now supported Chrome Devtools Protocol (CDP) API, so , it is really easy to execute a script on every page load. Here is an example code for that:

driver.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument', {'source': 'alert("Hooray! I did it!")'})

And it will execute that script for EVERY page load. More information about this can be found at:

Khanh Luong
  • 418
  • 4
  • 7
9

Since version 1.0.9, selenium-wire has gained the functionality to modify responses to requests. Below is an example of this functionality to inject a script into a page before it reaches a webbrowser.

import os
from seleniumwire import webdriver
from gzip import compress, decompress
from urllib.parse import urlparse

from lxml import html
from lxml.etree import ParserError
from lxml.html import builder

script_elem_to_inject = builder.SCRIPT('alert("injected")')

def inject(req, req_body, res, res_body):
    # various checks to make sure we're only injecting the script on appropriate responses
    # we check that the content type is HTML, that the status code is 200, and that the encoding is gzip
    if res.headers.get_content_subtype() != 'html' or res.status != 200 or res.getheader('Content-Encoding') != 'gzip':
        return None
    try:
        parsed_html = html.fromstring(decompress(res_body))
    except ParserError:
        return None
    try:
        parsed_html.head.insert(0, script_elem_to_inject)
    except IndexError: # no head element
        return None
    return compress(html.tostring(parsed_html))

drv = webdriver.Firefox(seleniumwire_options={'custom_response_handler': inject})
drv.header_overrides = {'Accept-Encoding': 'gzip'} # ensure we only get gzip encoded responses

Another way in general to control a browser remotely and be able to inject a script before the pages content loads would be to use a library based on a separate protocol entirely, eg: Chrome DevTools Protocol. The most fully featured I know of is playwright

Mattwmaster58
  • 2,266
  • 3
  • 23
  • 36
  • Great tip! What does this line do: `injected.append((req, req_body, res, res_body, parsed_html))`? I didn't find what `injected` refers to – Jean Monet Mar 29 '20 at 14:51
  • 1
    It's simply a record of injected resources. I have removed it to avoid confusion. – Mattwmaster58 Mar 29 '20 at 17:18
  • Thanks! Do you know if the `custom_response_handler` injection function allows to modify the response headers? I see we can return the response `body`, but in my example I would also want to add or modify a header in the response. – Jean Monet Mar 30 '20 at 11:28
  • I'm not sure, you could try (over)writing some keys in `res.headers`. – Mattwmaster58 Mar 30 '20 at 18:19
  • Seems like this feature deprecated in Januar 2021: https://pypi.org/project/selenium-wire/ with V3 - do you know an alternative? – n.r. Jul 14 '21 at 09:47
6

If you want to inject something into the html of a page before it gets parsed and executed by the browser I would suggest that you use a proxy such as Mitmproxy.

Jonathan
  • 8,453
  • 9
  • 51
  • 74
4

If you cannot modify the page content, you may use a proxy, or use a content script in an extension installed in your browser. Doing it within selenium you would write some code that injects the script as one of the children of an existing element, but you won't be able to have it run before the page is loaded (when your driver's get() call returns.)

String name = (String) ((JavascriptExecutor) driver).executeScript(
    "(function () { ... })();" ...

The documentation leaves unspecified the moment at which the code would start executing. You would want it to before the DOM starts loading so that guarantee might only be satisfiable with the proxy or extension content script route.

If you can instrument your page with a minimal harness, you may detect the presence of a special url query parameter and load additional content, but you need to do so using an inline script. Pseudocode:

 <html>
    <head>
       <script type="text/javascript">
       (function () {
       if (location && location.href && location.href.indexOf("SELENIUM_TEST") >= 0) {
          var injectScript = document.createElement("script");
          injectScript.setAttribute("type", "text/javascript");

          //another option is to perform a synchronous XHR and inject via innerText.
          injectScript.setAttribute("src", URL_OF_EXTRA_SCRIPT);
          document.documentElement.appendChild(injectScript);

          //optional. cleaner to remove. it has already been loaded at this point.
          document.documentElement.removeChild(injectScript);
       }
       })();
       </script>
    ...
init_js
  • 4,143
  • 2
  • 23
  • 53
  • Thanks for this very concise and well-explained answer. I know that things have changed quite a bit in the 6+ years since you posted this, but the basic Java example still seems to work... except with Firefox 99. When I try this technique with Firefox, the `executeScript` call completes successfully, but the function I'm trying to inject doesn't appear to persist (`typeof myFunction == 'undefined'`). If I run the same code directly in Developer Tools console, however, I get the expected result (`typeof myFunction == 'function'`). Do you have any suggestions for diagnosing this issue? – Scott Babcock Apr 21 '22 at 19:57
4

so I know it's been a few years, but I've found a way to do this without modifying the webpage's content and without using a proxy! I'm using the nodejs version, but presumably the API is consistent for other languages as well. What you want to do is as follows

const {Builder, By, Key, until, Capabilities} = require('selenium-webdriver');
const capabilities = new Capabilities();
capabilities.setPageLoadStrategy('eager'); // Options are 'eager', 'none', 'normal'
let driver = await new Builder().forBrowser('firefox').setFirefoxOptions(capabilities).build();
await driver.get('http://example.com');
driver.executeScript(\`
  console.log('hello'
\`)

That 'eager' option works for me. You may need to use the 'none' option. Documentation: https://seleniumhq.github.io/selenium/docs/api/javascript/module/selenium-webdriver/lib/capabilities_exports_PageLoadStrategy.html

EDIT: Note that the 'eager' option has not been implemented in Chrome yet...

Jacob
  • 524
  • 5
  • 18
  • Thanks! Was looking how to execute a script before the page is rendered and this works. I also got it to work in Chrome if anyone else comes across this. [Python Example](https://pastebin.com/cYLeFpv5) – 010011100101 Feb 25 '20 at 18:44
  • 2
    Doesn't work for me. This doesn't ensure the script will run before page load, it allows the script to run as soon as the page becomes interactive. – villasv Mar 06 '21 at 17:45
  • @010011100101 would you mind posting the code here as a solutoin? Thanks – n.r. Jul 14 '21 at 09:52