1

I want to write a npm package to localize an html url.
1. using the html url download the html page
2. parse the html file, extract all the js, css and img files used in the html and local these resources.
3. If these js, css and img files using some external resources, localize these resources. For example, extract background image in the css.

The first and second requirements are easy to meet. But I have no idea about the last one. I can parse the all the css files and localize the resources used in it. But how can I parse the js files?
For example: If the js adds a 'script src = XXX' tag into the html dom, how can I extract the src?

0o0zt
  • 11
  • 3

1 Answers1

0

I think I would try to use a headless browser to catch every network calls instead of trying to parse the code.

I didn't used it personally but PhantomJS seems to fit the bill.

It can be used to load a webpage then execute any script / css that would normally happen on the request and execute stuff once the page is loaded.

The network monitoring features are probably what you'll want to use.

mgadrat
  • 154
  • 1
  • 5
  • If I don't parse the file, can I change the origin file external resources' urls to the new local resources' urls? For example, – 0o0zt Aug 02 '16 at 03:19
  • I din't understand that, maybe your question could contain a little bit more context of what you're trying to achieve. For injected scripts, you can't do anything before they are actually inserted into the DOM. But you can listen to the DOM to detect newly inserted stuff : (http://stackoverflow.com/questions/4780822/how-can-i-detect-when-a-new-element-has-been-added-to-the-document-in-jquery) – mgadrat Aug 02 '16 at 03:48
  • Wow I did not see that you literally said you wanted to use them locally in your original question. My bad. – mgadrat Aug 02 '16 at 03:57