12

I'm trying to create a Firefox addon (using addon SDK) that will modify how the page is displayed, mostly as a training/learning exercise.

For some tasks (like augmenting pages with new functionality) using pageMod is perfectly fine. Page loads and I run some JS to show/hide/add elements.

My problem is: can I perform modification on DOM (so: the HTML document that is returned by server) before the page even starts displaying?

For example: the page returned from server is:

<html>
    <body>
        <table>
            <tr>
                <td>Item 1.1</td>
                <td>Item 1.2</td>
                <td>Item 1.3</td>
            </tr>
            <tr>
                <td>Item 2.1</td>
                <td>Item 2.2</td>
                <td>Item 2.3</td>
            </tr>
        </table>
    </body>
</html>

but I would like for FF to render instead:

<html>
    <body>
        <ul>
            <li>Item 1.1, Item 1.2, Item 1.3</li>
            <li>Item 2.1, Item 2.2, Item 2.3</li>
        </ul>
    </body>
</html>

Doing it after the page loads would first display the table, and then it would quickly 'blink' into a list. It could be fast enough, but if I were to change <img> tags into <a>, to for example prevent (not wanted) image loads, it is not sufficient.

I was looking at using contentScriptWhen: "start" in pageMod and trying attaching listeners, but I just can't see how I can actually modify the DOM 'on the fly' (or event prevent any kind of page display before all page was loaded).

I've checked cloud-to-butt extension, as it does modify the page on the fly, but I wasn't even able to get it to work: when attached as a pageMod on start the code failed on:

 document.getElementById('appcontent').addEventListener('DOMContentLoaded', function(e)

because document.getElementById('appcontent') was returning null.

I would be immensely thankful for some pointers: is it possible, how to attach the script, how to intercept the HTML and send it back on its way after some modifications.

EDIT: Ok, so I think that I'm able to intercept the data:

let { Ci,Cr,CC } = require('chrome');
let { on } = require('sdk/system/events');
let { newURI } = require('sdk/url/utils');
let ScriptableInputStream = CC("@mozilla.org/scriptableinputstream;1", "nsIScriptableInputStream", "init");
on('http-on-examine-response', function (event) {
    var httpChannel = event.subject.QueryInterface(Ci.nsIHttpChannel);
    var traceChannel = event.subject.QueryInterface(Ci.nsITraceableChannel);
    if (/example.com/.test(event.subject.URI.spec)) {
        traceChannel.setNewListener(new MyListener());
    }
}, true);

function MyListener(downloader) {
    this.data = "";
}

MyListener.prototype = {
    onStartRequest: function(request, ctx) {
        this.data = [];
    },

    onDataAvailable : function(request, context, inputStream, offset, count) {
        var scriptStream = new ScriptableInputStream(inputStream);
        this.data.push(scriptStream.read(count));
        scriptStream.close();
    },

    onStopRequest: function(request, ctx, status) {
        console.log(this.data.join(''));
    }
}

Now in onStopRequest I'd like to do something to the data and output it back to where it was originally going...

Note, that this works on strings not DOM, so it's not perfect, but it's a place to start :)

EDIT2:

Huh, I got it working, though I have a feeling I'm not really supposed to this that way:

onStopRequest: function(request, ctx, status) {
        //var newPage = this.data.join('');
        var newPage = "<html><body><h1>TEST!</h1></body></html>";
        var stream = converter.convertToInputStream(newPage);
        var count = {};
        converter.convertToByteArray(newPage, count);
        this.originalListener.onDataAvailable(request, ctx,
            stream, 0, count.value);

        this.originalListener.onStopRequest(request, ctx, status);
    },
Gerino
  • 1,943
  • 1
  • 16
  • 21
  • 1
    I actually had to do the same thing recently in a FF extension, and decided to just "hack" it by adding a style tag as soon as the head was loaded that hides the entire body, then modify the HTML, and when done, remove the style tag, showing the body, no blinking, problem solved. There's probably better ways to do it. – adeneo Feb 17 '15 at 23:34
  • That's an interesting idea and might come handy in the future :) Though I'm actually looking for a way to - amongst others - strip some scripts, remove some images (adblock-like) to - again, amonsts others - save on the load time. – Gerino Feb 17 '15 at 23:55
  • If you want to intercept data and feedback modified data use nsITraceableChannel: https://github.com/Noitidart/demo-nsITraceableChannel – Noitidart Feb 18 '15 at 21:08

3 Answers3

8

My problem is: can I perform modification on DOM (so: the HTML document that is returned by server) before the page even starts displaying?

Yes, javascript execution starts before the page is rendered the first time. The DOM Parser does notify mutation observers, so you can immediately strip elements as soon as they are added by the parser.

You can register mutation observers even in content scripts loaded with contentScriptWhen: "start" so they should be notified of all elements being added to the tree before they are rendered since the observer notifications are performed in the micro task queue while rendering happens on the macro task queue.

but I wasn't even able to get it to work: when attached as a pageMod on start the code failed on: document.getElementById('appcontent').addEventListener('DOMContentLoaded', function(e)

Of course. You should not assume that any element in particular - not even the <body> tag - is already available that early during page load. You will have to wait for them to become available.

And the DOMContentLoaded event can simply be registered on the document object. I don't know why you would register it on some Element.

(or event prevent any kind of page display before all page was loaded).

You don't really want that because it would increase page load times and thus reduce responsiveness of the website.

the8472
  • 40,999
  • 5
  • 70
  • 122
  • I'd argue that AdBlock decreases page load times by preventing user from having to download ad images, executing ad scripts etc. - and though of course processing the response will take time, it might not actually end up in increasing page load time (YMMV). – Gerino Feb 18 '15 at 01:03
  • I'll have a look at the mutation observers, they might come in handy :) – Gerino Feb 18 '15 at 01:04
  • what you mention (preventing requests) is a separate concern from modifying the DOM and can be done efficiently without interrupting the page load, that's what `nsIContentPolicy`s are for. – the8472 Feb 18 '15 at 01:10
  • That's a side effect, not the main goal - see original question. I want to get a webpage, transform it (think: from a bloated webpage extract content and put into a lightweight html/css) and display only that lightweight page. – Gerino Feb 18 '15 at 01:26
4
/*
 * contentScriptWhen: "start"
 *
 * "start": Load content scripts immediately after the document
 * element is inserted into the DOM, but before the DOM content
 * itself has been loaded
 */

/*
 * use an empty HTMLElement both as a place_holder
 * and a way to prevent the DOM content from loading
 */
document.replaceChild(
        document.createElement("html"), document.children[0]);
var rqst = new XMLHttpRequest();
rqst.open("GET", document.URL);
rqst.responseType = 'document';
rqst.onload = function(){
    if(this.status == 200) {
        /* edit the document */
        this.response.children[0].children[1].querySelector(
                "#content-load + div + script").remove();

        /* replace the place_holder */
        document.replaceChild(
                document.adoptNode(
                    this.response.children[0]),
                document.children[0]);

        // use_the_new_world();
    }
};
rqst.send();
  • +1 for the idea. It works but scripts will not be loaded or executed, no idea why. They must be inserted manually. Also, it isn't necessary to request the page again via XMLHttpRequest. – brandon Jun 19 '16 at 22:17
  • 1
    @brandon, if re-requesting the page isn't necessary, what's the alternative? – stephancasas Nov 21 '21 at 01:35
0

If you want to get into it before any script has executed there are document observers here: https://developer.mozilla.org/en-US/docs/Observer_Notifications#Documents such as content-document-global-created

Noitidart
  • 35,443
  • 37
  • 154
  • 323