18

How is it possible to programmatically save a web page snapshot with all its elements (css, js, images, ...) into one file?

I need to archive some web pages regularly. However, just saving their HTML code is useless - not only because of images missing but esp. because the absence of CSS on today's pages can turn a web page into unrecognizable mess.

I remember the .mht format that worked like this, but that required manual saving, and it was just a feature of IE. I believe there is an open-source solution that can achieve this programmatically, but despite hours of searching I cannot find it on the web.

Vacilando
  • 2,819
  • 2
  • 30
  • 27

5 Answers5

12

HTTrack, -%M

reisio
  • 3,242
  • 1
  • 23
  • 17
  • 1
    It doesn't download the javascript – nest Jan 20 '15 at 15:57
  • 1
    There isn't any JavaScript worth downloading that you wouldn't have loaded directly (& therefore saved directly). That said: You could do an ordinary httrack, without -%M, and then put that into an archive. With things like archivemount you can open them seemlessly, even though you don't need to. All easily scripted. Stack Overflow sucks. – reisio Aug 27 '17 at 01:17
9

Use wget in terminal

wget -p -k http://www.example.com/

It'll make a clone of site frontend html, css, js, svg etc. But not in one file as asked. Rather, it'll recreate the whole folder structure

E.g. if folder structure of www.example.com is as

 /css/*
 /js/*
 /index.html

then it'll create the same structure locally.

Docs: https://www.gnu.org/software/wget/manual/wget.html

Zain Khalid
  • 93
  • 2
  • 5
2

I think @reisio (+1) has you covered...

...But if only to plug a great free tool, I would point out the Firefox extension Save Complete, which does an admirable job of grabbing "complete" pages on an ad hoc basis. The output will be a single HTML file with an accompanying directory stuffed with all the resources - you can easily zip them up for archiving.

It's not without fault - I've had issues with corrupted .png files lately on OSX, but I use it frequently for building mockups off of live pages and it's a huge time-saver. (Also of note, it hasn't been updated for FF 4 yet, and is the sole reason I rolled back to 3.6)

peteorpeter
  • 4,037
  • 2
  • 29
  • 47
  • 2
    How is this method automated, or even programmable? – Christian Apr 11 '11 at 22:10
  • It's much _more_ automated than manually collecting all the resources and migrating the references, etc. See this caveat: "on an _ad hoc_ basis"? I'm not claiming it's _the perfect_ solution, but might be useful to people trying to achieve a similar, semi-automated result. Also, for the sake of argument, you could script FF to automate this further: http://macscripter.net/viewtopic.php?id=21304. (Do you think all potentially helpful, but imperfect, solutions should be -1'ed? I'm resisting the urge to down-vote your own imperfect, yet potentially helpful answer. Spirit foul.) – peteorpeter Apr 11 '11 at 23:18
  • Semi perfect? It works, it's not browser dependent, and it's more automated than trying to script Firefox! Are we back to "viewable by Firefox only" era again, or something? My solution can be done with any language on any platform. Your solution seems to work on firefox on a mac only. Plus firing a browser just to do some text manipulation sounds ridiculously over-engineered. – Christian Apr 12 '11 at 07:34
  • I'm not knocking your answer - for the record it sounds like the cleanest solution to the question asked. My hackles were raised by your attitude, not your knowledge. – peteorpeter Apr 12 '11 at 13:19
  • You could call it "overly defensive" if you want to. – Christian Apr 12 '11 at 14:53
0

Apple's Safari has a pretty good solution. It saves all HTML and CSS (sadly no JS) but in a format called webarchive. It's one file, but it requires Safari to save and open, and Safari requires a Mac. Even though Safari for Windows does exist, it's too old to work with webpages, and it doesn't even support saving as webarchive, or opening them. If you have a Mac, open any website in Safari and press ⌘S and then make sure that Web Archive appears in the drop down.

There is also a Chrome extension that can open these types of files, but not save them.

Apologies for replying to such an old thread, just wanted to spread this info!

John R.
  • 36
  • 4
0

If you are using Google Chrome just use the save page as menu entry (CTRL + s), and select complete website from the options at the bottom of the file dialog. This save the HTML and all required resources (in a separate folder).

ProTom
  • 1,194
  • 7
  • 7