I am trying to set up my Netscaler device with a Rewrite Policy. One of my requirements is to replace any non-domain URLs with the home page URL... that is, I want the Netscaler to replace all external links on a page being served from behind the device with the home page's URL (ex: https://my.domain.edu). The type of Rewrite Policy I'm trying to configure uses a PCRE-compliant regex engine to find specific text on a web page (multiple matches possible).
good links:
https://your.page.domain.edu -- won't be replaced
http://good.domain.edu -- also won't be replaced
bad links (should be replaced with home page URL):
https://www.google.com
http://not.the.best.example.org
http://another.bad.example.erewhon.edu
https://my.domain.com
I currently have this pattern:
(https?://)(?![\w.-]+\.domain\.edu)
According to the Netscaler's RegEx evaluation tool this matches the bad links above and doesn't match the good links, so it seems to be working... in fact, when I run this on a test page, the Netscaler finds all the URLs I want to replace and leaves the good URLs alone.
The problem is the Netscaler isn't replacing the URLs the way I want: it replaces the (https?://) group with the home page URL but leaves the remaining part of the bad URL. For example, it replaces http://www.google.com with: https://my.domain.eduwww.google.com
I can configure the Rewrite Policy to replace specific URLs (for example, https://www.google.com), so I know the mechanism works. Obviously, this won't work for the general case.
I've tried enclosing the entire regex in parentheses, but this didn't change anything.
Can a regular expression be written for the general case, to match the entire URL for all domains that aren't mine?
Thanks in advance for any help!