14

As you all know, external resources, like images, can be embedded into the html file using base64 encoding:

<img src="..." />

I'm looking for a pure browser-based javascript way to traverse an html page and embed all the external resources into the file so when I say $("html").html(), it returns all the page's contents. Even including its external resources.

Just so it makes sense, I'm trying to download web pages into single files using a headless browser on my server.

Mehran
  • 15,593
  • 27
  • 122
  • 221
  • If you're using JS, why encode the images? – Mooseman Oct 27 '14 at 19:34
  • Because JS can easily traverse all the html elements. Otherwise I'll need a parser to read and turn the tags into DOM objects before I can query them for external resources. – Mehran Oct 27 '14 at 19:37

2 Answers2

13

There are tools out there to do that. Examples:

While there are benefits to this approach, remember that a page visited more than once, or site with multiple pages with same JS/CSS files will enjoy client (browser) side caching.

JAR.JAR.beans
  • 9,668
  • 4
  • 45
  • 57
  • 1
    I'm sorry, I forgot to mention that by javascript I mean a browser-based one. I'm looking for a non-NodeJs solution. – Mehran Oct 27 '14 at 20:02
  • the tools I suggest are running as one time on the server, to generate the client side js/css. There are not server side solution, just tools. – JAR.JAR.beans Oct 29 '14 at 06:48
  • I know, but I'm looking for a solution that uses a web browser. I find this much more stable solution than NodeJs as a web browser's parser is much more powerful than any other. I intend to use PhantomJs with javascript. – Mehran Oct 29 '14 at 07:23
  • Does anyone have had any experience with those tools and could recomend one? – holzkohlengrill Feb 13 '19 at 12:25
1

Browser extensions

There are Save Page WE extension for Firefox and Chrome:

This extension can scroll or zoom out the page in order to allow fetching lazy-loading resources before saving.

Command line tools

There is also the inliner npm module which exposes the inliner command line utility — it works with some URLs but throws an error with others. It pipes the output to stdout and therefore needs to be used like e.g. inliner https://http.cat > cats.html.

It can be installed with (assuming you have nodejs+npm):

npm install -g inliner
ccpizza
  • 28,968
  • 18
  • 162
  • 169