1

I have a string like this:

page-9000,page-template,page-type,page-category-128,image-195,listing-latest,rss-latest,even-more-info,even-more-tags

I made this regex that I expect to get the whole tags with:

(?<=\,)(rss-latest|listing-latest-no-category|category-128|page-9000)(?=\,)

I want it to match all the ocurrences.

In this case:

page-9000 and rss-latest.

This regex checks whole words between commas just fine but it ignores the first and the last because it's not between commas (obviously).

I've also tried that it checks if it's between commas OR one comma at the beginning OR one comma to the end, however it would give me false positives, as it would match:

category-128

while the string contains:

page-category-128

Any help?

prgrm
  • 3,734
  • 14
  • 40
  • 80

3 Answers3

3

Try using the following pattern:

(?<=,|^)(rss-latest|listing-latest-no-category|category-128|page-9000)(?=,|$)

The only change I have made is to add boundary markers ^ and $ to the lookarounds to also match on the start and end of the input.

Script:

$input = "page-9000,page-template,page-type,page-category-128,image-195,listing-latest,rss-latest,even-more-info,even-more-tags";
preg_match_all("/(?<=,|^)(rss-latest|listing-latest-no-category|category-128|page-9000)(?=,|$)/", $input, $matches);
print_r($matches[1]);

This prints:

Array
(
    [0] => page-9000
    [1] => rss-latest
)
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
3

Here is a non-regex way using explode and array_intersect:

$arr1 = explode(',', 'page-9000,page-template,page-type,page-category-128,image-195,listing-latest,rss-latest,even-more-info,even-more-tags');

$arr2 = explode('|', 'rss-latest|listing-latest-no-category|category-128|page-9000');

print_r(array_intersect($arr1, $arr2));

Output:

Array
(
    [0] => page-9000
    [6] => rss-latest
)
anubhava
  • 761,203
  • 64
  • 569
  • 643
1

The (?<=\,) and (?=,) require the presence of , on both sides of the matching pattern. You want to match also at the start/end of string, and this is where you need to either explicitly tell to match either , or start/end of string or use double-negating logic with negated character classes inside negative lookarounds.

You may use

(?<![^,])(?:rss-latest|listing-latest-no-category|category-128|page-9000)(?![^,])

See the regex demo

Here, (?<![^,]) matches the start of string position or a , and (?![^,]) matches the end of string position or ,.

Now, you do not even need a capturing group, you may get rid of its overhead using a non-capturing group, (?:...). preg_match_all won't have to allocate memory for the submatches and the resulting array will be much cleaner.

PHP demo:

$re = '/(?<![^,])(?:rss-latest|listing-latest-no-category|category-128|page-9000)(?![^,])/m';
$str = 'page-9000,page-template,page-type,page-category-128,image-195,listing-latest,rss-latest,even-more-info,even-more-tags';

if (preg_match_all($re, $str, $matches)) {
  print_r($matches[0]);
}
// => Array ( [0] => page-9000 [1] => rss-latest )
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563