13

No matter how hard I try, I can't seem to get httrack to leave links going to other domains intact. I've tried using the --stay-on-same-domain argument, and that doesn't seem to do it. I've also tried adding a filter doesn't do it.

There simply must be some option I'm missing here.

alexgolec
  • 26,898
  • 33
  • 107
  • 159

3 Answers3

15

Setting the option "Maximum external depth" to 0 did not work , even though it should be expected.

What works:

Go to > Options > Scan Rules and enter in the text field (extra line): -* +*yourdomain.com/*

Here are more settings to learn about: HTTrack: How to download folders only from a certain subfolder level?

Avatar
  • 14,622
  • 9
  • 119
  • 198
  • 4
    How frustrating to have to manually specify the domain in the scan rules each time. ‍♂️ It should really detect that. – Simon East Oct 10 '19 at 04:29
  • 1
    When I did this it reduced the number of pages downloaded from other domains - but strangely not to zero. Some pages were still downloaded from other domains. – Joe Aug 12 '20 at 16:20
1

Set maximum external depth to 0. In the GUI that this can be found here:

enter image description here

If you are using the command line version, the option is

%e0

[Note: not an expert on HTTRACK, so please correct if necessary]

thomasB
  • 303
  • 3
  • 11
  • 3
    This doesn't always work. I have my settings the same as your screenshot and yet I also get many many pages from Wikipedia. – Simon East Oct 10 '19 at 03:57
-2

In "Set Option" > "Limits", try

Maximum mirroring depth = 1 (Keep this 2, when 1 doesn't work)

And

Maximum external depth = 0

Worked for me!!

Shawn J. Molloy
  • 2,457
  • 5
  • 41
  • 59
Mithilesh
  • 15
  • 3