2

I have a custom php script that looks through text to find key phrases and replaces them using preg_replace, but I'm having trouble getting a clean regex for the job.

For example this sentence:

The fox jumped over the gated fence because it did not have hands to open the gate.

I would want to bold all instances of gate, yet respect oddities like gated. And also preserve original capitalization too if possible.

So the end sentence would read like this:

The fox jumped over the gated fence because it did not have hands to open the gate.

Here is the regular expression I've created myself, and you will see that it would not account for gate. because of the period following the word. And also the regex does not respect any original capitalization.

$description  = preg_replace("/ $keyphrase /", " <b>$keyphrase</b> ", $description, $limit);
atwellpub
  • 5,660
  • 11
  • 38
  • 52

1 Answers1

1
preg_replace_callback('/\b'.preg_quote($keyphrase, '/').'\b/i',
    function ($matches) {
        return "<b>" . $matches[0] . "</b>";
    }, htmlspecialchars($subject));

You need to use word boundaries (\b). Do not forget to:

  • Properly escape the key phrase for use in the regex expression (use preg_quote).
  • Properly escape for HTML special characters the sentence. This is necessary assuming you're turning a text string into an HTML string. If the initial string already has HTML, do not use regex.

EDIT PHP < 5.3 must use:

function cb($matches) {
    return "<b>" . $matches[0] . "</b>";
};
preg_replace_callback('/\b'.preg_quote($keyphrase, '/').'\b/i', 'cb',
   htmlspecialchars($subject));
Artefacto
  • 96,375
  • 17
  • 202
  • 225
  • Hello artefacto, Why should I avoid strings with html? Its highly likely that strings will contain divs, tables, links, and images. I can parse the images and links out and re-add them in, but the tables and divs will still be there occasionally. h – atwellpub Aug 23 '10 at 00:43
  • @atwell See here: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Artefacto Aug 23 '10 at 01:30
  • Hey Art, I'm going ahead and test-running your regex and it's returning this error: Parse error: syntax error, unexpected T_FUNCTION. I'm out of my league when it comes to using functions to create result sets could you advise? -- maybe because I'm already within a function? – atwellpub Aug 23 '10 at 02:58
  • @atwell That's because you're not running PHP 5.3. See my edit. – Artefacto Aug 23 '10 at 03:30
  • Thanks again art. Question, how can I pass variables into the cb function without using globals? ... Reason being is because I'll define start and end codes like and depending on a settings I have associated with the keyphrase. So if keyphrase 1 is setup to be bolded I would need to pass $start_tag, which has dynamically been set as and $end_tag which has been set as – atwellpub Aug 23 '10 at 16:52
  • @atwellpub See http://stackoverflow.com/questions/3527386/php-passing-more-than-one-argument-to-a-callback-function-using-php-5-2/3527532#3527532 – Artefacto Aug 23 '10 at 17:14