2

I am looking for a regex pattern that matches any word that contains XYZ and does not start with a colon :.

For example, I would like to match from This isXYZ a :exampleXYZa only isXYZ.

My first idea was to use this regex pattern:

/(?<!\:[^\s\r\t\n])XYZ/

Basically, a negative lookbehind to assure that there is no colon without whitespace beforehand. However, this does not work, because lookbehind assertion must be fixed length.

EDIT: I would also like to have utf8 support.

Adam
  • 25,960
  • 22
  • 158
  • 247

1 Answers1

2

You can have a regex like below:

/\b((?<!:)\w*XYZ\w*)\b/ui
  • \b before and after is to just match a word boundary.

  • In ((?<!:)\w*XYZ\w*), we check for any word that has XYZ in it and has zero or more characters before it and zero or more characters after it. With the help of negative lookbehind (?<!:), we make sure that it is not preceded by a :.

  • As mentioned by @unclexo in the comments, you can add the u modifier at the end to support UTF-8 sequence matching. See here for more info.

  • You can also add the i flag for case insensitive matching.

Snippet:

<?php

$tests = [
        'This isXYZ a :exampleXYZa',
        'isXYZ a :exampleXYZa abcXYZ',
        'isXYZ a :exampleXYZXYZa  abcXYZ',
        'XYZ',
        'XYZjdhf',
        'This isXYZ a example:XYZa',
        'äöüéèXYZ :äöüéèXYZäöüéè'
    ];

foreach($tests as $test){
    if(preg_match_all('/\b((?<!:)\w*XYZ\w*)\b/ui',$test,$matches)){
        print_r($matches[0]);
    }
}

Demo: https://3v4l.org/Y8SMj

nice_dev
  • 17,053
  • 2
  • 21
  • 35
  • @Nick Nice catch. I am assuming it shouldn't match, but let's see what OP considers it to be. – nice_dev Jan 25 '20 at 06:37
  • Thanks for your solution. The downsite of using `\b` is that `\b` matches only the position between an ascii char and a non-ascii char. This means umlaute like `äöüéè` and many others are not supported. For example with `This isäXYZ` he would only find `XYZ` and not `isäXYZ`. – Adam Jan 25 '20 at 06:51
  • 1
    @Adam It should work if you add `u` flag at the end of the pattern. – unclexo Jan 25 '20 at 07:00
  • @Adam Updated my answer. Seems to work fine with `This isäXYZ` too. – nice_dev Jan 25 '20 at 07:10
  • 1
    @Adam you should now select the answer. Thanks – unclexo Jan 25 '20 at 07:12