Firstly, to clarify the question, the aim is to download index.html
plus all the requisite parts of that page (images, etc). The -p
option is equivalent to --page-requisites
.
The reason the page requisites are not always downloaded is that they are often hosted on a different domain from the original page (a CDN, for example). By default, wget refuses to visit other hosts, so you need to enable host spanning with the --span-hosts
option.
wget --page-requisites --span-hosts 'http://www.amazon.com/'
If you need to be able to load index.html
and have all the page requisites load from the local version, you'll need to add the --convert-links
option, so that URLs in img
src attributes (for example) are rewritten to relative URLs pointing to the local versions.
Optionally, you might also want to save all the files under a single "host" directory by adding the --no-host-directories
option, or save all the files in a single, flat directory by adding the --no-directories
option.
Using --no-directories
will result in lots of files being downloaded to the current directory, so you probably want to specify a folder name for the output files, using --directory-prefix
.
wget --page-requisites --span-hosts --convert-links --no-directories --directory-prefix=output 'http://www.amazon.com/'