1

I need a way to take a web page that's already loaded in the page and save the page's full DOM (as an HTML string) such that were I to load the HTML offline as a single file, it would preserve the effects of all CSS and whatever scripts had been run prior to saving it. Keeping the images would be a bonus, but even having them missing but with a placeholder so that the layout is preserved is fine.

The catch is I can't reload or requery any of the resource files (JS/CSS). Fonts are not important.

This means the resulting HTML can't refer to external files. Is this even possible using just JavaScript?

EDIT:

1) This needs to be a programmatic solution using JavaScript, not a browser UI solution.

Whookie
  • 133
  • 2
  • 8
  • 1
    I suppose you could make AJAX requests for all the external resources and then "in line" them before you output the html string. You might even be able to find a js library to base64 encode the images to inline them as well. Seems like a lot of code/work though. – Dominic P Oct 09 '14 at 02:50
  • Check out appcache? http://www.html5rocks.com/en/tutorials/appcache/beginner/ – Alfredo Delgado Oct 09 '14 at 02:52
  • Unfortunately I can't make AJAX requests because I can't assume that I will still have a valid session if the page is secured by a login of some type. I need a solution that literally takes what is in the page and preserves it. Is there no way to introspect the current state of every element and decorate the HTML after dumping it from the DOM such that it keeps the layout intact? – Whookie Oct 09 '14 at 02:53
  • Are you just trying to preserve the visual look? Or, the full interactivity (e.g. javascript) too? The former is more possible than the latter. – jfriend00 Oct 09 '14 at 03:02
  • @Whookie: He means make the AJAX requests when CAPTURING the page not when LOADING the page. Basically find all css files and read their content and paste them into a big ` – slebetman Oct 09 '14 at 03:35
  • @slebetman I don't know what you mean, my script is only able to run after the page has already loaded, and it's being injected by the browser as part of a browser API. @-jfriend00 I don't care at all about interactivity, just the visual look (layout primarily, and text, images are a bonus only). – Whookie Oct 09 '14 at 04:03
  • Try `right-click` -> `Save As...` -> `Webpage complete` . See also http://stackoverflow.com/questions/18077217/is-there-a-way-to-save-css-js-changes-of-remote-resource-between-page-reloads-or/22683090#22683090 – guest271314 Oct 09 '14 at 04:10
  • @Whookie: Yes, since the page is already loaded your script can of course `getElementsByTagName('link')` to get the urls of all css files. Then use AJAX to download them. Then get their `responseText` make a big string out of it: `""`. Then append that string to the string you got from the dom: `"" + css1 + css2 + "" + document.body.innerHTML`. Now that giant string is your entire live DOM + all CSS. – slebetman Oct 09 '14 at 04:13
  • @guest271314: Unfortunately `save as` saves the original source HTML, not the final rendered live DOM. So the results may be different from what's on the page when saved. – slebetman Oct 09 '14 at 04:14
  • @slebetman, check that, the problem is that css files can load other css files, and this approach would lose any 2ndary files wouldn't it? Unless I were to recursively parse every CSS file until I didn't see any @-import statements, I could end up losing important CSS style info. Am I wrong? – Whookie Oct 09 '14 at 04:20
  • @Whookie: You're correct. Dominic and myself simply assumed that that would be obvious to any web developer and assumed you'd know what we mean by AJAXing it without describing every step in detail. – slebetman Oct 09 '14 at 04:28
  • @slebetman, yes understood. I suppose I could resolve the @-import-ed CSS files as well by simply checking for those within whatever CSS I pulled down. I suppose even images could be accounted for by pulling them down, rendering them within Canvas, and then pulling out their base64 encoded data and replacing all image tags within the DOM with the base64 data. Sounds like you can get almost 100% reproduction. – Whookie Oct 09 '14 at 04:37
  • A possible alternate approach that I've never used but seems promising would be the [window.getComputedStyle](https://developer.mozilla.org/en-US/docs/Web/API/Window.getComputedStyle) method. I suppose you could loop through the entire DOM, get the style for each element/psuedoelement, and store that information as inline style attributes. – Dominic P Oct 09 '14 at 20:04

1 Answers1

-1

you can store the entire HTML along with inline CSS as a var in in JavaScript (which you write ). Maybe you can write some JS which uses HTML5 local storage to store the external JS/CSS resources and use them later while loading the page offline.

  • I wouldn't consider this an attempt at an answer, but I won't downvote because I understand you don't have enough reputation to use the comments section instead... – Marty Oct 09 '14 at 03:52