1

In a PHP variable I have some text that contains some keywords. These keywords are currently capitalised. I would like them to remain capitalised and be wrapped in curly brackets but once only. I am trying to write upgrade code but each time it runs it wraps the keywords in another set of curly brackets.

What REGEX do I need to use to match the keyword alone without also matching it if it is {KEYWORD}.

For example, the text variable is:

$string = "BLOGNAME has posted COUNT new item(s),

TABLE

POSTTIME AUTHORNAME

You received this e-mail because you asked to be notified when new updates are posted.
Best regards,
MYNAME
EMAIL";

And my upgrade code is:

$keywords = array('BLOGNAME', 'BLOGLINK', 'TITLE', 'POST', 'POSTTIME', 'TABLE', 'TABLELINKS', 'PERMALINK', 'TINYLINK', 'DATE', 'TIME', 'MYNAME', 'EMAIL', 'AUTHORNAME', 'LINK', 'CATS', 'TAGS', 'COUNT', 'ACTION');
foreach ($keywords as $keyword) {
    $regex = '|(^\{){0,1}(\b' . $keyword . '\b)(^\}){0,1}|';
    $replace = '{' . $keyword . '}';
    $string = preg_replace($regex, $replace, $string);
}

My REGEX is currently not working well at all, it is stripping some spaces and also on each run placing more curly brackets around most (but not all) keywords. What am I doing wrong? Can someone correct my regex?

mario
  • 144,265
  • 20
  • 237
  • 291
  • 1
    Just a suggestion. You might be able to improve efficiency slightly by putting it into one regular expression and saying ... `(BLOGNAME|BLOGLINK|TITLE|POST|etc)` instead of checking each one individually. – Jeff Parker May 19 '11 at 09:39
  • How would the PHP code know which word it is replacing though? – Matt Robinson May 19 '11 at 12:19
  • 1
    In the context of a regex replace, when you put an expression in parentheses, you 'capture' the value to which that expression refers. You can then refer to the captured value in the replacement string using `$x`, where 'x' is the index of the captured expression, $1 for the first capture, $2 for the second, and so-on. Example: `preg_replace("/.* Customer#([0-9]+)/", "I captured the number $1", "This is for Customer#1234");` would return 'I captured the number 1234'. – Jeff Parker May 19 '11 at 12:44
  • 1
    If you do the same for the expression given above `(BLOGNAME|BLOGLINK|etc)`, then you can refer to that captured value, your keyword, as $1 in the replacement string. Something like: `preg_replace($expression, "{$1}", $string);`. You might need to use a value greater than 1, depending on how many captures your chosen expression is doing. – Jeff Parker May 19 '11 at 12:53
  • The latest answer shows how this could be done. – Jeff Parker May 19 '11 at 12:58
  • I can't get this approach to work for me so I'll stick with the old way :-) I suspect it may be to do with there being an unknown number of keywords in the initial text string. – Matt Robinson May 19 '11 at 13:21

3 Answers3

6

You are looking for negative assertions. They are not written using the ^ syntax as in character classes but as (?<!...) and (?!...). In your case:

'|(?<!\{)(\b' . $keyword . '\b)(?!\})|';
mario
  • 144,265
  • 20
  • 237
  • 291
2
  • It will work if keyword does not contain special char.
  • (A1) rows can be removed from regex, if source text can not contain {keyword} or necessary to leave '{}' symbols around keywords in result text (was {keyword} need {{keyword}} for formatting as example)

$text = <<<EOF
BLOGNAME has posted COUNT new item(s),

TABLE

POSTTIME AUTHORNAME

You received this e-mail because you asked to be notified when new updates are posted.
Best regards,
MYNAME
EMAIL
EOF;

$aKeywords = array('BLOGNAME', 'BLOGLINK', 'TITLE', 'POST', 'POSTTIME', 'TABLE', 'TABLELINKS', 'PERMALINK', 'TINYLINK', 'DATE', 'TIME', 'MYNAME', 'EMAIL', 'AUTHORNAME', 'LINK', 'CATS', 'TAGS', 'COUNT', 'ACTION');
$keywords = implode('|', $aKeywords);

$reSrch = '/
            (?<!\{)             # (A1) prev symbol is not {
            \b                  # begin of word
            ('.$keywords.') # list of keywords
            \b                  # end of word
            (?!\{)              # (A1) next symbol is not {
            /xm';               //  m - multiline search & x - ignore spaces in regex

$reRepl = '{\1}';

$result = preg_replace($reSrch, $reRepl, $text);

echo '<pre>';
// echo '$reSrch:'.$reSrch.'<hr>';
echo $result.'<br>';
Dmitrij Golubev
  • 694
  • 4
  • 13
  • Initially could not get this to work but that must have been down to me as I've now got it working. Since my original code was going to cycle though 18 keywords and do 6 preg_replace calls each time I suspect your approach will be a lot more efficient so my thanks to you. – Matt Robinson May 19 '11 at 15:30
  • @Matt Robinson: Welcome. Ask, If you will be need an explaination. – Dmitrij Golubev May 19 '11 at 16:25
1

Why regex? Just use str_replace:

foreach ($keywords as $k) {
  $string = str_replace($k, '{'.$k.'}', $string);
}
Félix Saparelli
  • 8,424
  • 6
  • 52
  • 67
  • Because it won't work properly if you have overlapping keys. `str_replace("POST", "{POST}", $string);` would turn `{POSTTIME}` into `{{POST}TIME}`, which is bad. – Jeff Parker May 19 '11 at 09:34
  • Ah, right. And if you add spaces to `$k` like `" $k "`? Hmm, it won't match when the keyword is at the beginning or end of the string. Bah. – Félix Saparelli May 19 '11 at 09:39
  • You've got it. Regular expressions are often misused and evil, but here they're kind of justified (even if the intended use is a tiny bit strange) :) – Jeff Parker May 19 '11 at 09:42
  • @Jeff,When you say the usage is a tiny bit strange are you thinking that there is a better way? I'm all ears! – Matt Robinson May 19 '11 at 10:33
  • 1
    No, it's just that adding braces to keywords that are already defined seems a bit pointless. Obviously you'll be replacing these with actual values, but in that case, why bother to put braces on? – Félix Saparelli May 19 '11 at 10:35
  • User feedback sadly, I have some users who insist on putting these keywords in their content on places they don't want it replaced. DATE is one of the keywords and some people like to capitalised the word to draw attention but don't want it substituted. So, I'm having to brace all the keywords :( – Matt Robinson May 19 '11 at 10:39
  • So why not brace them directly in the template? You should have some kind of way to differentiate keywords from text anyway. – Félix Saparelli May 19 '11 at 10:41
  • That's the aim of this code, it's part of a plugin for WordPress - the templates already exist for many users so I want to make upgrading as painless as possible. The templates will vary between sites. – Matt Robinson May 19 '11 at 10:43
  • That was my thought, but what you've said makes sense. I just had the one suggestion, which I've added as a comment of the main question ... but to summarise, rather than doing X number of regex replace operations, you can do it in one. – Jeff Parker May 19 '11 at 11:14