2

In PHP I have a String $string and an array $acronyms (in the form "UK" => "United Kingdom").

Now I want to replace all acronyms within $string by some HTML Tags. For example Hello UK should turn into Hello <acronym title="United Kingdom">UK</acronym></pre>

I do it this way:

foreach($acronyms as $acronym => $tooltip){
     $string = preg_replace('/'.$acronym.'/i', ''.$acronym.'', $string);
}

The problem is: Let's say I have a text Hello UK and have an array to replace "UK" with "United Kingdom" and "Kingdom" with "RandomWord". Then the text will replace into Hello <acronym title="United <acronym title="RandomWord">Kingdom</acronym>">UK</acronym> which obviously is chaos.

So the question is: How do I make my preg_replace only look for the words while they are NOT within an <acronym> tag? (neither in title-attribute, nor within the tag itself)

Edit: second attempt according to a response (because I can't put code in reply). Still the same problem, the text within acronym gets replaced a second time...

foreach($acronyms as $acronym => $tooltip){
        $acronyms[$acronym] = '<acronym title="'.$tooltip.'">'.$acronym.'</acronym>';
}
$string = str_ireplace(array_keys($acronyms), array_values($acronyms), $string);

user2015253
  • 1,263
  • 4
  • 14
  • 25
  • This is just like: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Muqito Jan 27 '13 at 10:12
  • Strip all acronyms first, then add them again. – dualed Jan 27 '13 at 11:49
  • Oh and usually substitutions like that are either done on the client (per Javascript) or in a way that does not change the source. Then your problem just does not come up. – dualed Jan 27 '13 at 12:10

3 Answers3

1

You can use strtr(). It doesn't rescan the string after performing a replacement:

foreach ($acronyms as $acronym => $tooltip) {
    $acronyms[$acronym] = sprintf('<acronym title="%s">%s</acronym>',
        htmlspecialchars($tooltip),
        htmlspecialchars($acronym)
    );
}

echo strtr($str, $acronyms);
Ja͢ck
  • 170,779
  • 38
  • 263
  • 309
0

Here's an attempt at the regex version:

foreach($acronyms as $acronym => $tooltip){
    $rexp = '/' . $acronym . '(?!((?!<acronym).)*<\/acronym>)/i';
    $string = preg_replace($rexp, ''.$acronym.'', $string);
}

Seems to work for me. It does the following:

  1. Match the $acronym variable with a negative look ahead...
  2. where a closing acronym tag can be found
  3. but stop the lookahead when an opening acronym tag is before it.

Ultimately this matches only where it's not within an acronym tag (including all attributes such as the title).

Here's an example of it in action: gSkinner regex example

aaronjbaptiste
  • 554
  • 4
  • 14
0

Don't try to do everything with regexes :

  1. Parse your HTML using a HTML/XML parsing library.
  2. Iterate over your HTML tags, replace what you have to replace.
  3. Ask your "html parsing lib" to convert this back to a "HTML string".
Julien Palard
  • 8,736
  • 2
  • 37
  • 44