13

I managed to collect the behavior of a complex web site into a webarchive. Thereafter I would like to turn that webarchive into an html set of nested directory. Yet, when I did it both with Waf and with a commercial software bought on the the Apple store, what I get is just the nested directory with the html page at the bottom and no images, nor css nor working links. If you are interested the webarchive document is at:

http://www.miafoto.it/it/GiroMilano.webarchive

while the weak product of the extraction is at:

http://www.miafoto.it/it/Giromilano/Pagine/default.aspx

and the empty directories above. In addition to the different look, the webarchive displays the same behavior as the official web site - when a listbox vales is selected and then the button pushed - while the extracted version produces a page with no contents by loading itself rather than the official page. As you may see the webarchive is over 1MB while the extraction just little over 1 KB.

What is wrong with it and how may I perform such an apparently trivial business with usable results?

Thanks,

user1785898
  • 167
  • 1
  • 1
  • 7
  • I discovered the web site at: http://www.atm.it/it/Giromilano/Pagine/default.aspx creates axd type files with embedded and preset Javascript code inside. What beats me is how Safari is able to compact all of this in its webarchive and that only rivals the astonishment of not being able to tap at that magic. Moreover I tried to download a copy of the full website by WinHTTPTrack but the file appeared as a .html file instead of .aspx. Been focused on Mac and linux I must say I could not be more confused. Could someone shed some light? Thanks, Fabrizio – user1785898 Nov 21 '12 at 17:14

4 Answers4

9
textutil -convert html example.webarchive
  • Be careful — html with files is created in the same folder as webarchive!
  • Also, I had to open .html with text editor and replace "file:///image.tiff" links (replace "file:///" with "") so they point to relative path.
  • Also, not all browsers display .tiff images.

Who knew we have Stack Overflow wiki?

alexkovelsky
  • 3,880
  • 1
  • 27
  • 21
  • 2
    Unfortunately textutil corrupts original HTML structure, creating only visually similar document. If original DOM structure should be preserved, other tool has to be used. – dond Aug 18 '22 at 09:18
1

I find that this WebArchiveExtractor.app works on my Mac (Mojave OS) – https://robrohan.github.io/WebArchiveExtractor/

  • If you have a new question, please ask it by clicking the [Ask Question](https://stackoverflow.com/questions/ask) button. Include a link to this question if it helps provide context. - [From Review](/review/late-answers/31993655) – Uttam Nath Jun 13 '22 at 14:40
0

I managed the issue by finding all parameters being submitted in the page and submitting them too in my script, ignoring the webarchive.

user1785898
  • 167
  • 1
  • 1
  • 7
0

To save HTML pages on mac, I use chrome. Download and install it and save your page as HTML. Safari will save the web pages with webarchiveformat and for me, it's very hard to deal with it.

Fariman Kashani
  • 856
  • 1
  • 16
  • 29