2

The website loads its assets from some other domain & I am not able to download those assets at all.(JS, CSS, Images, etc)

Say the website is example.com & it includes assets from, say, assets.orange.com.

How do I tell WGET to download those assets, save it into different folders(js, css, images) and convert the links in the downloaded HTML files?

I don't know what I am doing wrong & where to specify assets.orange.com in this command.

wget \
         --mirror \
       --recursive \
     --no-clobber \
     --page-requisites \
     --html-extension \
     --convert-links \
     --restrict-file-names=windows \
     --domains example.com \
     --no-parent \
         example.com
Shivam Singhal
  • 31
  • 1
  • 12
  • Possible answered here: https://stackoverflow.com/questions/13031147/how-to-download-a-full-website?rq=1 – Paulo Jun 09 '21 at 01:18
  • @Paulo Just tried this but for some reason the CDN site is not getting connected. Here is the error: Connecting to assets.website-files.com (assets.website-files.com)|2600:9000:2181:2e00:11:3b84:d200:93a1|:443... failed: Resource temporarily unavailable. – Shivam Singhal Jun 09 '21 at 01:34
  • Looks like that the problem is not with wget command, but with the server. The server is refusing your connection. I tried assets.website-files.com and it also denied my connection for some reason, without any error code/message. If this webserver is denying all connections it can mean that it is trying to prevent someone to copy its files. – Paulo Jun 10 '21 at 15:38
  • @Paulo you are right. I realized the same. I put in the right CDN address but it's downloading HTML files only. Anyway we can talk over a chat? – Shivam Singhal Jun 11 '21 at 19:06

1 Answers1

1

where to specify assets.orange.com in this command

wget manual says that --domains usage is

-D domain-list
--domains=domain-list

where domain-list is a comma-separated list of domains, so if you wish to specify more than one you should do

--domains=example.com,assets.orange.com

According to wget manual if you aim to to download all the files that are necessary to properly display a given HTML page you might use

-p
--page-requisites

Beware that This includes such things as inlined images, sounds, and referenced stylesheets.

Andy
  • 4,783
  • 2
  • 26
  • 51
Daweo
  • 31,313
  • 3
  • 12
  • 25
  • 2
    I used all of these options and none of them are working. `--page-requisites` just downloads HTML files and doesn't download assets. – Shivam Singhal Jun 10 '21 at 12:03