Some Context
After fixing the code of a website to use a CDN (rewriting all the URLs to images, JavaScript & CSS), I need to test all the pages on the domain to make sure all the resources are fetched from the CDN.
All the sites pages are accessible through links, no isolated pages.
Question
Is there some automated way to give a domain name and request all pages + resources of the domain?
Answer:
OK, I found I can use wget
as so:
wget -p --no-cache -e robots=off -m -H -D cdn.domain.com,www.domain.com -o site1.log www.domain.com
Options explained:
-p
- download resources too (images, CSS, JavaScript, etc.)--no-cache
- get the real object, do not return server cached object-e robots=off
- disregardrobots
andno-follow
directions-m
- mirror site (follow links)-H
- span hosts (follow other domains too)-D cdn.domain.com,www.domain.com
- specify witch domains to follow, otherwise will follow every link from the page-o site1.log
- log to file site1.log-U "Mozilla/5.0"
- optional: fake the user agent - useful if server returns different data for different browserwww.domain.com
- the site to download
Enjoy!