I'm trying to download an archive of my website — 3dsforums.com — using wget, but there are millions of pages I don't want to download, so I'm trying to tell wget to only download pages that match certain URL patterns, and yet I'm running into some roadblocks.
As an example, this is a URL I would like to download:
http://3dsforums.com/forumdisplay.php?f=46
...so I've tried using the --accept-regex
option:
wget -mkEpnp --accept-regex "(forumdisplay\.php\?f=(\d+)$)" http://3dsforums.com
But it just downloads the home page of the website.
The only command that remotely works so far is the following:
wget -mkEpnp --accept-regex "(\w+\.php$)" http://3dsforums.com
This provides the following response:
Downloaded 9 files, 215K in 0.1s (1.72 MB/s)
Converting links in 3dsforums.com/faq.php.html... 16-19
Converting links in 3dsforums.com/index.html... 8-88
Converting links in 3dsforums.com/sendmessage.php.html... 14-15
Converting links in 3dsforums.com/register.php.html... 13-14
Converting links in 3dsforums.com/showgroups.php.html... 14-29
Converting links in 3dsforums.com/index.php.html... 16-80
Converting links in 3dsforums.com/calendar.php.html... 17-145
Converting links in 3dsforums.com/memberlist.php.html... 14-99
Converting links in 3dsforums.com/search.php.html... 15-16
Converted links in 9 files in 0.009 seconds.
Is there something wrong with my regular expressions? Or am I misunderstanding the use of the --accept-regex
option? I've been trying all sorts of variations today but I'm not quite grasping what the actual problem is.