Suppose we have a website called http://www.example.com
. I would like to get a list of its URI pages (just the URLs themselves, not URLs inside those URLs) - either all of them (including all subdomains and all subpages), or just some of them provided that they follow a particular globbing and/or regex pattern.
So, for example, I'm looking for something that gets all URLs (just the URL addresses themselves) that follow a pattern such as http://*.example.com/*
. I'm aware that globbing in Linux (e.g. via the shell) is (mostly or fully?) limited to local files and directories (correct me if I'm wrong).
How can I achieve this?
I suppose that something related (although not quite the same?) is discussed here: How to find all links / pages on a website.
P.S. All of the URLs are part of a website that is made of static webpages only. I'm not sure if it's even possible to do the same thing with websites that are made of dynamic webpages... Also, I'm not sure if any URLs with query strings in them (e.g. http://www.example.com/?=abc&xyz
) can be captured at all in such a way.