3

I followed the instructions provided in this previous post. I am able to download a working local copy of the webpage (e.g. wget -p -k https://shapeshed.com/unix-wget/) but I would like to integrate all the files (js, css and images e.g. using base64 encoding) into a single html file (or another convenient format). Would this be possible?

Community
  • 1
  • 1
mat
  • 2,412
  • 5
  • 31
  • 69

4 Answers4

3

Try using HTTrack

It is very efficient and easy to use website copier. All you have to do is paste the link of the website you want to make a local copy of

Follow these steps as you want everything in single page

  1. Minify all the stylesheets and put them in <style> in your main HTML page use CSS minifier
  2. Minify all the scripts and put them inside <script> in the same file. Use JavaScript Minifier
  3. To deal with images use spites
Ani
  • 2,848
  • 2
  • 24
  • 34
1

It certainly can be done. But you’ll have to do couple of simple things manually, since there are no available tools to automate some of the steps.

  1. Download the web page using Wget with all dependencies.
  2. Copy the contents of linked stylesheets and scripts to main HTML file.
  3. Convert images to Base64 data URIs contained in HTML and CSS, then insert them to main HTML file.
  4. Minify the edited HTML file.
  5. Convert HTML file to Base64 data URI.

Here is an example of a single-page application encoded to Base64 data URI created to demonstrate the concept (copy and paste below code to web browser address bar):

data:text/html;charset=utf-8;base64,PCFkb2N0eXBlIGh0bWw+DQo8aHRtbCBsYW5nPSJlbiI+DQoJPG1ldGEgY2hhcnNldD0idXRmLTgiPg0KCTx0aXRsZT5TaW5nbGUtcGFnZSBBcHBsaWNhdGlvbiBFeGFtcGxlPC90aXRsZT4NCgk8c3R5bGU+DQoJCS8qIENvZGUgZnJvbSBDU1MgZmlsZXMgZ29lcyBoZXJlLiAqLw0KCQlib2R5IHsNCgkJCWZvbnQtZmFtaWx5OiBzYW5zLXNlcmlmOw0KCQl9DQoJCWJ1dHRvbiB7DQoJCQlkaXNwbGF5OiBibG9jaw0KCQl9DQoJPC9zdHlsZT4NCgk8c2NyaXB0Pg0KCQkvLyBDb2RlIGZyb20gLmpzIGZpbGVzIGdvZXMgaGVyZS4gDQoJCWZ1bmN0aW9uIGNoYW5nZVBhcmFncmFwaCgpIHsNCgkJICAgIGRvY3VtZW50LmdldEVsZW1lbnRzQnlUYWdOYW1lKCJwIilbMF0uaW5uZXJIVE1MID0gIkNvbnRlbnQgb2YgcGFyYWdyYXBoIGNoYW5nZWQuIjsNCgkJfQ0KCTwvc2NyaXB0Pg0KCTxib2R5Pg0KCQk8aW1nIHNyYz0iZGF0YTppbWFnZS9wbmc7YmFzZTY0LGlWQk9SdzBLR2dvQUFBQU5TVWhFVWdBQUFVQUFBQUR3QkFNQUFBQ0RBNkJZQUFBQU1GQk1WRVZVVmx1T2o1TC8vLzlrWm1xbXA2bUJnb1dhbTUyeHNyTnpkSGk4dkw3dDdlNzI5dmJHeDhqazVPVFEwZExhMnR2SHNtSDhBQUFDSjBsRVFWUjRBZXpCZ1FBQUFBQ0FvUDJwRjZrQ0FBQUFBQUFBQUFBQUFBQUFBQUFBWUExdElLU2twRERxUUdMQXFBTkhIY2dzSWd3a3d4SUJ6SllCaEJSaEdJYmZiWGZiMWUzcU5vRUU5NVN1bTJuM1Z1SndNSHNRa0FGUVpBVUF4bDA2UU9zRXVNaENDTWNRQVRFWEJhaURBOGdFSUpJQXNKYUFNdmsrVGdrQTVuL2cvN3p2NE9HYitZMmN4djdqVkVaMzRLZG5kNStrTlFudXd1b2NNbDJCOTVZZUZoRHZTVHFmRTAwdldhV3RBcUtrTnNHcndFWUw0S1BrSjNFcW5WanNndTBTWURTdVM5Qk1lQUN3WnFGenJBN0dyZ2x1NHl6cUVuUnlnSkdVdzlzU050ekt5YlNFelNXczF5VzR1WjhEcDY4QXRlR1dXaEJaTVp6TWdhd0J3M0d6SkI3WEpQaFoyN0N1aGd0VzFVSXFRVXY0WXFwa1BiZ21IVUJTazJDaUh0ejA3Y294T1JVdzlTbTdBQXVwRHkvcXVtYlVzY20xcEdkSHZ3RUVTRlpuNTNCZ0VZTGdJUTVOd0o4aHV4MlNZTHZBUVlFS1hvVG81YVQ4ZjhXZkJrWWFnT0FCTEh4U0RvbFVRcllDMytUVUwrZ3JWYk1BZlljM1Z2ZzFjeXoxcWlvTFEvQ0RuZ042QlBGcGVYWlJ6NXB6U0FJUVhBRytBcWlQVVVCbXhYQUprUUlRN0dEa1o5OXp2UFBQejhKYUNJSTZBYTc3ZEI5NDdlOWt0d1NJVjRNUWJPV01VcDkwci9veGRrRjFjb2oyRkFiZHdWaC9zUlZiZUhreVUyQThyYXBVV3NKVVliSUQ3MllQSVZhZzlNRzVvVUJwbGppSlFtVUw0NmZDNWM1UjlldFBlM0FnQUFBQWdBQm83UEZYR0tCcUFBQUFBQUFBQUFBQUFBQUFBQUFBQUxnTmtYVy9TUloxSldBQUFBQUFTVVZPUks1Q1lJST0iIGFsdD0iIj4NCgkJPGgxPlNpbmdsZS1wYWdlIEFwcGxpY2F0aW9uIEV4YW1wbGU8L2gxPg0KCQk8cD5UaGlzIGlzIGFuIGV4YW1wbGUgb2YgYSB3ZWIgYXBwIHRoYXQgaW50ZWdyYXRlcyBIVE1MLCBDU1MsIEphdmFTY3JpcHQsIGFuZCBhbiBpbWFnZSBpbnRvIG9uZSAuaHRtbCBmaWxlIHRoYXQgaXMgZW5jb2RlZCB0byBCYXNlNjQuPC9wPg0KCQk8YnV0dG9uIHR5cGU9ImJ1dHRvbiIgb25jbGljaz0iY2hhbmdlUGFyYWdyYXBoKCkiPkNoYW5nZSBQYXJhZ3JhcGg8L2J1dHRvbj4NCgk8L2JvZHk+DQo8L2h0bWw+
Tzar
  • 5,132
  • 4
  • 23
  • 57
  • Thank you for the answer, however I'm an looking for an automated way to achieve this, even if it requieres using several applications/commands. I'll wait for another anwser. – mat Apr 16 '17 at 18:39
  • Hey Mat, I answered your original inquiry. Nowhere in your question have you emphasized that you were looking for an automated way to perform this action. – Tzar Apr 17 '17 at 11:41
1

Another solution would be to use a web proxy with a custom extension in order to store the sources, cf. https://github.com/SommerEngineering/WebProxy

This GitHub project is a simple web proxy by me, written in Go. Inside the Main.go line 71 and beyond will copy any data from the original site to your browser.

In your case, you would add a query if the data is already stored or not. If so, load from disk and send it to your browser. If not, load it from the source and store it to the disk.

Your condition of using a singe-file storage would not be an issue: Go can read and write e.g. ZIP files, cf. https://golang.org/pkg/archive/zip/. If you need these web site dumps immediately, a bit of code is needed to follow all links in order to store anything now.

Therefore, this answer is not a ready-to-go solution to your question. Rather, it would need to code a little bit. Go code could be compiled to any platform (x86, ARM, PPC) and operating system (Linux, macOS, Windows).

Hope, this answer gives an option for you.

SommerEngineering
  • 1,412
  • 1
  • 20
  • 26
0

There is a Chrome extension SingleFile that does exactly this

Lars Holm Jensen
  • 1,645
  • 1
  • 12
  • 14