Chrome extension - get Html DOM before load js on browser

Question

I'm developing a chrome extension that needs to block the load of the html page, do some validations on the javascript, that cames in the page, in my content script, and proceed(or not) with the loading of the page.

In my manifest with "run_at": "document_start", the content scrip get a empty html and can't do the validation. With run_at at document_end, it alredy executed js that comes in the page, and just after that my extension does the validation of it...

Is there a way to set like a DOMContentBeforeLoad in my content script or something? I'm really out of options..

Thanks

This could possibly help but I'm not quite sure: http://stackoverflow.com/questions/8943219/how-to-stop-page-load-in-html-static-page — DotDotBeep, Jun 11 '15 at 22:50
This answer to an other question might help you get started: http://stackoverflow.com/a/13270422/2488877 — b00t, Jun 11 '15 at 23:12
already tryed it, it still loads the page before it runs the js of the content script — ogoid, Jun 11 '15 at 23:28
not putting as answer as i haven't tried, but from the background page you could install a network request hook that can modify the response. it watches for the incoming page request and modifies the returned data so then chrome loads your modified version as if it was the original. https://developer.chrome.com/extensions/webRequest#type-BlockingResponse — Zig Mandel, Jun 12 '15 at 03:56
this seems a good idea but im not sure how to do it.. not that many examples on the internet :/ — ogoid, Jun 12 '15 at 11:12
give it a try and show what didnt work if stuck. its too broad to answer that here. — Zig Mandel, Jun 12 '15 at 12:32
i tried to put listeners on a background.js but that didnt work.. im almost sure this have to be done in the content script. it is the only that have access to the DOM. [link](https://developer.chrome.com/extensions/content_scripts) — ogoid, Jun 12 '15 at 13:53

score 3 · Answer 1 · answered Aug 20 '15 at 09:50

Take a look at how TopLevel.js works: https://github.com/kristopolous/TopLevel (interesting source at https://github.com/kristopolous/TopLevel/blob/master/toplevel.js)

It's a library you explicitly include in your page. When it's reached in the page and run it immediately document.write()'s a <plaintext> element with style='display: none', which immediately stops the browser parsing the rest of the page at all, and hides the plain text result (plaintext is a deprected element that stops interpreting page content, and treats all the HTML as vanilla unparsed plain text: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/plaintext).

Toplevel then parses the text content of the <plaintext> element itself (and does some templating, which is the point of the library), and document.write()'s the resulting new content to the page by hand.

You should be able to do something similar: inject a <plaintext> element to stop the page being parsed by the browser, parse it yourself (or do whatever you want with it), and then potentially write out whatever you like (including the original content) to the page once you're happy.

score 0 · Answer 2 · answered Jun 12 '15 at 01:21

0

I think to do what you are doing you are going to have to do what you did with document_start, then load the html page via an ajax call and parse it yourself.

The browsers typically don't load all the scripts and then execute them, this happens asynchronously in the order of the page, so there isn't a point you can catch it at where the javascript will have loaded but nothing will have executed (unless you control the content of the page as well).

answered Jun 12 '15 at 01:21

Brian

1,513
2
14
17

why would that work? its unlikely that his own ajax of the page would finish before chrome's which started that same request before yours. – Zig Mandel Jun 12 '15 at 03:53
Yeah, it's not a pleasant (or performant) proposition, but preloading/parsing the page is the only method I'm aware of that would actually let him see the contents before anything loaded. He'd have to then allow the page to load. Your suggestion may also work but I'm not sure he could process all the contents before loading or just 1 at a time. – Brian Jun 12 '15 at 10:41
and how would i manage to do that ajax call? of a page that im already getting the responde.. thanks for the help – ogoid Jun 12 '15 at 11:11
You'd have to stop the pages response until you were ready for it, and html can easily be pulled in via a GET request (that's all a browser is doing). Parsing it will be a huge pain, though there may be a library somewhhere to do this. You obviously can take another approach if you don't like this solution, but I'm not sure you are going to find something that can pause dom loading as you stream javascript through your end. You likely can process the javascript if you don't need to prevent loading, but to do both is not going to have a simple few lines of code answer. – Brian Jun 12 '15 at 11:34
no. this answer will not work. also not clear why you suggest parsing the html. look at my question ro your answer above – Zig Mandel Jun 12 '15 at 12:31
You'd have to stop the page processing before doing this for it to work. The original question required stopping *all* dom loading before rendering the page. This is what's difficult and why your suggestion above will not work. If he just wants to process each js file individually, and then allow/disallow it, your suggestion will work perfectly. However, if the requirement is to process all the javascript before loading the dom, he will essentially have to build a pre-processor as I described. The reason he'd have to parse the html is to actually get the javascript without loading the page. – Brian Jun 12 '15 at 13:28
can you give me a example of how to do this Brian? – ogoid Jun 12 '15 at 13:55
Not really... I think you might be underestimating how difficult this is to do. You basically would have to create an entire library to do this processing, there isn't a quick method. Browsers are designed to load as they get content as quickly as possible, you'd basically need to build (or find) a parser and loop over content. Doing this on a per file basis is much easier using Zig's method above, but parsing every file on a page before the browser parses it is a very large task; it's more or less building an entire browser without the rendering parts.This will also effect performance. – Brian Jun 12 '15 at 14:53

Chrome extension - get Html DOM before load js on browser

2 Answers2