I've found myself having to use a python script to access a webarchive.
What I have is a 'megawarc' web archive file from http://archive.org/details/archiveteam-fanfiction-warc-11. I need to un-megawarc this, using the python script found at https://github.com/alard/megawarc.
I'm trying to run the 'restore' command, and I have the three files needed (FILE.warc.gz, FILE.tar, and FILE.json.gz) from the first link.
I have both python 2.7 and 3.3 installed.
--------------update--------------
I've ran both this method..
python megawarc restore FILE
and this method..
Make sure you have the files megawarc and ordereddict.py in the same directory, with the files you want to convert. Rename the file megawarc to megawarc.py Open a python console in this directory
Type the following code (line by line) :
import sys
sys.argv = ['megawarc','restore','FILE']
import megawarc
megawarc.main()
using python 2.7, and this is what I get..
c:\Python27>python megawarc restore FILE
Traceback (most recent call last):
File "megawarc", line 563, in <module>
main()
File "megawarc", line 552, in main
mwr.process()
File "megawarc", line 460, in process
self.process_entry(entry, tar_out)
File "megawarc", line 478, in process_entry
entry["target"]["offset"], entry["target"]["size"])
File "megawarc", line 128, in copy_to_stream
raise Exception("End of file: %d bytes expected, but %d bytes read." % (buf_size, l))
Exception: End of file: 4096 bytes expected, but 236 bytes read.
Is there something else i'm missing?
I have the following files all in c:\python27
FILE.megawarc.json.gz
FILE.megawarc.tar
FILE.megawarc.warc.gz
megawarc
ordereddict.py
Is this some type of corrupt file error? Is there something i'm missing?