Convert webarchive to html

Question

I managed to collect the behavior of a complex web site into a webarchive. Thereafter I would like to turn that webarchive into an html set of nested directory. Yet, when I did it both with Waf and with a commercial software bought on the the Apple store, what I get is just the nested directory with the html page at the bottom and no images, nor css nor working links. If you are interested the webarchive document is at:

http://www.miafoto.it/it/GiroMilano.webarchive

while the weak product of the extraction is at:

http://www.miafoto.it/it/Giromilano/Pagine/default.aspx

and the empty directories above. In addition to the different look, the webarchive displays the same behavior as the official web site - when a listbox vales is selected and then the button pushed - while the extracted version produces a page with no contents by loading itself rather than the official page. As you may see the webarchive is over 1MB while the extraction just little over 1 KB.

What is wrong with it and how may I perform such an apparently trivial business with usable results?

Thanks,

I discovered the web site at: http://www.atm.it/it/Giromilano/Pagine/default.aspx creates axd type files with embedded and preset Javascript code inside. What beats me is how Safari is able to compact all of this in its webarchive and that only rivals the astonishment of not being able to tap at that magic. Moreover I tried to download a copy of the full website by WinHTTPTrack but the file appeared as a .html file instead of .aspx. Been focused on Mac and linux I must say I could not be more confused. Could someone shed some light? Thanks, Fabrizio — user1785898, Nov 21 '12 at 17:14

score 9 · Answer 1 · answered May 24 '15 at 20:34

9

textutil -convert html example.webarchive

Be careful — html with files is created in the same folder as webarchive!
Also, I had to open .html with text editor and replace "file:///image.tiff" links (replace "file:///" with "") so they point to relative path.
Also, not all browsers display .tiff images.

Who knew we have Stack Overflow wiki?

answered May 24 '15 at 20:34

alexkovelsky

3,880
1
27
21

2

Unfortunately textutil corrupts original HTML structure, creating only visually similar document. If original DOM structure should be preserved, other tool has to be used. – dond Aug 18 '22 at 09:18

score 1 · Answer 2 · answered Jun 11 '22 at 02:08

1

I find that this WebArchiveExtractor.app works on my Mac (Mojave OS) – https://robrohan.github.io/WebArchiveExtractor/

answered Jun 11 '22 at 02:08

user2407486

39
3

If you have a new question, please ask it by clicking the [Ask Question](https://stackoverflow.com/questions/ask) button. Include a link to this question if it helps provide context. - [From Review](/review/late-answers/31993655) – Uttam Nath Jun 13 '22 at 14:40

score 0 · Answer 3 · answered Dec 12 '12 at 11:51

0

I managed the issue by finding all parameters being submitted in the page and submitting them too in my script, ignoring the webarchive.

answered Dec 12 '12 at 11:51

user1785898

167
1
1
7

Fariman Kashani · Answer 4 · 2021-11-02T09:49:20.487

0

To save HTML pages on mac, I use chrome. Download and install it and save your page as HTML. Safari will save the web pages with webarchiveformat and for me, it's very hard to deal with it.

edited Nov 02 '21 at 09:49

answered Sep 13 '21 at 20:08

Fariman Kashani

856
1
16
29

Convert webarchive to html

4 Answers4