1

Ok here is my situation... I have installed a glossary add-on on my vBulletin forum. If a term is found on the forum, it will replace the term by a link to the glossary definition.

here is the regex code used by the add-on :

$findotherterms[] = "#\b$glossaryname\b(?=\s|[.,?!;:]\s)#i";
$replacelinkterms[] = "<span class=\"glossarycrosslinkimage\"><a href=\"$glossarypath/glossary.php?do=viewglossary&amp;term=$glossaryid\"' onmouseover=\"glossary_ajax_showTooltip('$glossarypath/glossary_crosslinking.php?do=crosslink&term=$glossaryid',this,true);return false\" onmouseout=\"glossary_ajax_hideTooltip()\"><b>$glossaryname&nbsp;</b></a></span>";
$replacelinkterms[] = "<a href=\"glossary.php?q=$glossaryname\">$glossaryname</a>";
$glossaryterm = preg_replace($findotherterms, $replacelinkterms, $glossaryterm, $vbulletin->options['vbglossary_crosslinking_limit']);
return $glossaryterm;

The problem is that if there is a link inside a forum post with an existing term, the add-on will create a link inside the link...

So let's say "test" is a glossary term and i have this forum post:

some forum post including <a href="http://www.test.com">test</a> link

The addon will convert it to :

some forum post including <a href="http://www.<a href="glossary.php?q=test">test</a>.com"><a href="glossary.php?q=test">test</a> link

So, how can i modify this code to NOT replace anything if the string is found inside an existing link ?

Anar Choi
  • 201
  • 2
  • 3
  • 9
  • 2
    simple answer: don't use [regexs on html](http://stackoverflow.com/a/1732454/118068) – Marc B Jul 05 '13 at 16:38
  • possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Barmar Jul 05 '13 at 16:45
  • this is not my script. i didn't write this add-on. i'm just trying to fix it. – Anar Choi Jul 05 '13 at 19:35

1 Answers1

3

Description

It's better to actually capture the bad strings which you don't want replaced with the good strings that you want replaced, and then simply apply some logic.

In this case the regex will:

  • find all the anchor tags from open <a ...> to close </a>. Because this is first in the regex, it'll capture all the undesirable test strings which exist inside an anchor tag.
  • find all the strings test, note this portion could be replaced with a | delimited list of all your glossary terms. This value is inserted into Capture Group 1.

/<a\b(?=\s)(?:[^>=]|=\'[^\']*\'|="[^"]*"|=[^\'"\s]*)*"\s?>.*?<\/a>|(test)

enter image description here

Then the PHP logic selectively replaces the text based on if the capture group 1 was found.

PHP Example

Live Example: http://ideone.com/jpcqSR

Code

    $string = 'some forum test post including <a href="http://www.test.com">test</a> link';
    $regex = '/<a\b(?=\s) # capture the open tag
(?:[^>=]|=\'[^\']*\'|="[^"]*"|=[^\'"\s]*)*"\s?> # get the entire tag
.*?<\/a>
|
(test)/imsx';

    $output = preg_replace_callback(
        $regex,
        function ($matches) {
            if (array_key_exists (1, $matches)) {
                return '<a href="glossary.php?q=' . $matches[1] . '">' . $matches[1] . '<\/a>';
            }
            return $matches[0];
        },
        $string
    );
    echo $output;

Before Replacement

some forum test post including <a href="http://www.test.com">test</a> link

After Replacement

some forum <a href="glossary.php?q=test">test<\/a> post including <a href="http://www.test.com">test</a> link

Ro Yo Mi
  • 14,790
  • 5
  • 35
  • 43