1

Is it anyhow possible (with newer XPath version maybe) to get following thing working:

//a/@href[not contains("DOMAIN OF THE CURRENT PAGE")]

DOMAIN OF THE CURRENT PAGЕ should work like variable, which gets the domain - something like {HTTP_HOST}.

I want to get all external links on this way.

kjhughes
  • 106,133
  • 27
  • 181
  • 240
Evgeniy
  • 2,337
  • 2
  • 28
  • 68

1 Answers1

1

If the domain of the current page exists as content of the current page, then, yes, you can select it and use it in an XPath predicate. Otherwise, no, there is no standard, universal variable defined in XPath for the domain of the current page.

Any given XPath hosting language or tool may have a mechanism to provide the domain of a page. For XPath 3.0, they might leverage the standard environment variable functions, fn:environment-variable and fn:available-environment-variables.

Alternatively, you could construct the XPath dynamically within the hosting language that knows the page – see How to pass variable parameter into XPath expression?.

kjhughes
  • 106,133
  • 27
  • 181
  • 240
  • Your cited Python method pointed me to this: https://stackoverflow.com/a/2109183/1992004 . I think, this will be the way to go. Thank you. – Evgeniy Nov 22 '19 at 21:39