1

I have noticed that the word boundary \bword\b does not work inside brackets when doing a preg_replace() in PHP.

Specifically, I'm trying to exclude the full word > (which stands for > in HTML), but since the word boundary does not trigger inside brackets as in [^\b>\b], any of those characters by itself, like g or &, will be detected as a non-match. If you try to do a match outside the brackets, \b works as expected in PHP even though the word starts with a & a non-character.

Any thoughts/ideas to get around this situation?

Amal Murali
  • 75,622
  • 18
  • 128
  • 150
  • That is because, inside the brackets (which is called a character class, by the way), `>` is not a single entity any more. It is a list of characters — `&`, `g`, `t`, and `;`. `[^\b>\b]` will match anything that is not one of the characters above, or a word boundary. (`\b` being repeated twice is redundant, and has no effect on the end result whatsoever). – Amal Murali Jun 17 '14 at 06:36
  • what´s the solution then? I need to not-match the whole word, not the individual characters – the dead tree . Jun 17 '14 at 06:37
  • Your use of *two* `\b`s inside the square brackets suggests you don't know what square brackets are for. I'd suggest a regex tutorial. – Biffen Jun 17 '14 at 06:37
  • 1
    I have to use [] in this case because I'm doing a "all characters but these" condition which requires me to start with [^ – the dead tree . Jun 17 '14 at 06:40
  • @ᴹᴬᴺᴰᴿᴬᴷᴱ: How exactly is that question a duplicate? – Amal Murali Jun 17 '14 at 06:49
  • @thedeadtree.: Just a formatting tip. For inline code formatting, you can wrap the piece of code in backticks: `foo bar > baz bak`, and it will be displayed as it is. You can view the current markdown for the question by clicking [*Edit*](http://stackoverflow.com/posts/24257055/edit). – Amal Murali Jun 17 '14 at 06:58
  • @thedeadtree. A negated character class means ~ "one character that is not one of the specified", thus `[^\b>\b]` is the same as `[^;&\bgt]` (although I'm not sure `\b` will work inside a class), i.e. not a `;` *or* an `&`, and so on, since the order of the characters doesn't matter. If you want to negate a *sequence* of characters you'll should use [negative look-around](http://www.regular-expressions.info/lookaround.html). – Biffen Jun 17 '14 at 07:26

2 Answers2

1

To exclude in PHP, (*SKIP)(*F) is your friend

In PHP, excluding anything is frighteningly simple thanks to the powerful (*SKIP)(*F) syntax (also available in Perl).

To exclude > and watch something else, you can just do this:

>(*SKIP)(*F)|something_else

The left side of the alternation | matches complete >then deliberately fails, after which the engine skips to the next position in the string. The right side matches something_else, and we know that it is not > because it was not matched by the expression on the left. Just make sure that something_else is not something generic such as .* as that could roll over all the following > instances. For instance, here, \w+ would be a perfectly fine pattern for something_else, as it does not clash with >

Further reading about this and other techniques to exclude patterns in regex

How to match (or replace) a pattern except in situations s1, s2, s3...

Community
  • 1
  • 1
zx81
  • 41,100
  • 9
  • 89
  • 105
0

One solution to my own question is: instead of doing a [^word] condition, check if the word/sentence I want is not immediately followed by the word I don't want. As in:

>(?!>)

For my particular case, it worked.

Amal Murali
  • 75,622
  • 18
  • 128
  • 150