I am having problems trying to get my regular expression right. Basically, I have an HTML string which contains various links. If the href attribute points to the same domain, or a domain in a list of approved domains, nothing is changed. Anything else should be changed to a redirect page with the original href as a URL parameter
for example, assume the following domain names are allowed:
domain1, domain2, domain3
and disallowed domains point to "/redirect.htm?url=..."
I would want the following string
<p>this is a paragraph with
<a href="/index.htm">link 1</a> and
<a href="http://domain4/page.htm">link 2</a> and
<a href="http://www.domain1.com">link3</a> and
<a href="http://www.domain5.com/directory/page.htm">link 4</a>
</p>
to be changed to:
<p>this is a paragraph with
<a href="/index.htm">link 1</a> and
<a href="/redirect.htm?url=domain4/page.htm">link 2</a> and
<a href="http://www.domain1.com">link3</a> and
<a href="/redirect.htm?url=www.domain5.com/directory/page.htm">link 4</a>
</p>
I should also point out that I am using IdocScript, a java based custom language for our content management system. I don't need help with that, just the regular expression.
the best I have come up with so far (which clearly doesn't work) is:
<$ regex = "href=\"(^(/|domain1|domain2|domain3)" $>
<$ regexReplaceAll( originalString, regex, 'href="/redirect.htm?url=$1') $>
Can anyone help?