1

I am trying to match a string within the word boundary.

preg_match('/\bTORUK Cirque du Soleil\b/ims',
           'Show: TORUK Cirque du Soleil with Lady Gaga', $matches);

Output: TORUK Cirque du Soleil

This works perfect. But when there are quotes in the string it doesn't work as expected. For example,

preg_match('/\bTORUK "Cirque du Soleil"\b/ims',
           'Show: TORUK "Cirque du Soleil" with Lady Gaga', $matches);

It doesn't match at all. Expected output in this case is TORUK "Cirque du Soleil".

Tried using \B i.e. non-word boundary, but breaks in strings where there are no quotes.

Have created a fiddle here.

Samir Selia
  • 7,007
  • 2
  • 11
  • 30
  • Quote manual: _“A word boundary is a position in the subject string where the current character and the previous character do not both match \w or \W (i.e. one matches \w and the other matches \W)”_ - with `"[space] ` you don’t have that. – misorude Dec 17 '18 at 12:33
  • That's right, in case of double quotes word boundary doesn't fall in. Any work around for these type of cases? – Samir Selia Dec 17 '18 at 12:35
  • Why are you using `\b`? it's not a word boundary. – Andreas Dec 17 '18 at 12:36
  • Why do you need `\b` here at all? Do you expect sth. like `ABCTORUK` ? – Jan Dec 17 '18 at 12:36
  • I'm using word boundary to prevent sub-string from being matched. E.g. if the string is `ShowTORUK Cirque du Soleil with Lady Gaga`, it shouldn't match in this case. – Samir Selia Dec 17 '18 at 12:40
  • Look for lookarounds instead of word boundaries `preg_match('~(?<!\w)$str(?!\w)~', ...);` – revo Dec 17 '18 at 12:51
  • So `TORUK "Cirque du Soleil"` and `TORUK Cirque du Soleil` should match? At what point can there be quotes, only before `Cirque` and after `Soleil`? – user3783243 Dec 17 '18 at 12:58
  • @user3783243, It's a sample string. There can be any variation of it. Position of quotes is not fixed. Quotes may or may be not present. – Samir Selia Dec 17 '18 at 13:01
  • @revo, lookaround is working. Can you please explain what `(?<!\w)` and `(?<!\w)` do ? – Samir Selia Dec 17 '18 at 13:04
  • The former is a negative lookbehind that ensures there is no preceding *word character* and the latter with no `<` at beginning ensures there is no following *word character*. You may change them to `(?<!\S)` and `(?!\S)` if you mean the `$str` shouldn't be preceded or followed by any non-whitespace characters and not just word characters (`[a-zA-Z0-9_]`) – revo Dec 17 '18 at 13:08
  • @Samir Can you show how you used the looks? With your 4 examples that should match I can't get it to function as you described still. https://regex101.com/r/9Ea96M/1/ – user3783243 Dec 17 '18 at 13:15
  • 1
    @user3783243 You used two `(?<!\w)`. One should be `(?!\w)`. – revo Dec 17 '18 at 13:17
  • @revo Ah, Thanks... and it looks like the demo has a typo in it both `Times Presents: TORUK Cirque du Soleil with Lady Gaga` is suppose to match and not match. I guess it was just supposed to match. – user3783243 Dec 17 '18 at 13:23
  • 1
    @revo, thanks for the lookaround solution :) – Samir Selia Dec 17 '18 at 13:37

2 Answers2

0

First, as @misorude pointed out, you need to specify proper delimiters, e.g. /. Second, you can specify to match a word boundary or a quote - something like this:

preg_match('/\bTORUK "?Cirque du Soleil("|\b)/',
           'Show: "TORUK Cirque du Soleil with Lady Gaga"', $matches);

Note that this deals with the specific example you provided and you may need to adjust the code accordingly.

Aleks G
  • 56,435
  • 29
  • 168
  • 265
0

You don't need (and shouldn't use) the \b.
You have a sentence not a word.

preg_match('/TORUK "Cirque du Soleil"/ims',
       'Show: TORUK "Cirque du Soleil" with Lady Gaga', $matches);
var_dump($matches);

output:

array(1) {
  [0]=>
  string(24) "TORUK "Cirque du Soleil""
}

To answer your comment.
Use word boundry on the first and last word only:

preg_match('/\bTORUK\b "Cirque du \bSoleil\b"/ims',
       'showTORUK "Cirque du Soleil" with Lady Gaga', $matches);
var_dump($matches);

https://3v4l.org/bIW4i

Andreas
  • 23,610
  • 6
  • 30
  • 62