0

I've worked with regexes for years and never had this much trouble working out a regex. I am using PHP 7.2.2's preg_match() to return an array of matching numbers for parsing, hence the parentheses in the regex.

I am trying to match one or more numbers followed by an x followed by one or more numbers where the entire string is not followed by a hyphen. When $input is 18x18, 18x18- or 18x18size, the matches are 18 and 1. When the $input is 8x8, there are no matches.

I seem to be doing something fundamentally wrong here.

<?php
$input = "18x18";    
preg_match("/(\d+)x(\d+)[^-]/", $input, $matches);

Calling the print_r($matches) results in:

Array
(
    [0] => 18x18
    [1] => 18
    [2] => 1
)

The parens are there because I am using PHP's preg_match to return an array of matches. I understand when hyphens should be escaped and I've tried both ways to be sure but get the same results. Why doesn't this match?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Todd Hammer
  • 183
  • 1
  • 9
  • 1
    Do you want it like [`'~(\d+)x(\d++)(?!-)~'`](https://regex101.com/r/vvDqST/1)? – Wiktor Stribiżew Jul 10 '18 at 11:49
  • @WiktorStribiżew Can you explain the possessive `++` please? – Zenoo Jul 10 '18 at 11:51
  • 1
    PCRE regex should not be confused with Python `re` that does not support possessive quantifiers, thus it cannot be closed as [Negative lookahead not working after character range with plus quantifier](https://stackoverflow.com/questions/46458408/negative-lookahead-not-working-after-character-range-with-plus-quantifier) duplicate. – Wiktor Stribiżew Apr 06 '21 at 08:18
  • 3
    This question is not about use of possessive quantifiers. It is about matching something not followed by a character. As shown in [selected answer](https://stackoverflow.com/a/46458459/548225) how a word boundary solution is suffice for both `python` and `php`. – anubhava Apr 06 '21 at 10:30
  • 1
    No, it is not always sufficient. Word boundary is not a solution at all, it is a *workaround*. `(\d+)x(\d+)\b(?!-)` [won't match](https://regex101.com/r/tgDrns/1) the size in `12x24size`, while `(\d+)x(\d++)(?!-)` [will](https://regex101.com/r/tgDrns/2). – Wiktor Stribiżew Apr 08 '21 at 22:30
  • 1
    When the input - as per OP - can be both `18x18` or `18x18-`, the `12x24size` is also necessary to handle. – Wiktor Stribiżew May 24 '21 at 00:47
  • 5
    `12x24size` is not part of OP's requirement. Requirement should come from OP not from the posted answers. – anubhava May 28 '21 at 15:28

1 Answers1

7

You may use

'~(\d+)x(\d++)(?!-)~'

It can also be written without a possessive quantifier as '~(\d+)x(\d+)(?![-\d])~' since the \d inside the lookahead will also forbid matching the second digit chunk partially.

Alternatively, additionally to the lookahead, you may use word boundaries:

'~\b(\d+)x(\d+)\b(?!-)~'

See the regex demo #1 and regex demo #2.

Details

  • (\d+)x(\d++)(?!-) / (\d+)x(\d+)(?![-\d]) - matches and captures 1 or more digits into Group 1, then matches x, and then matches and captures into Group 2 one or more digits possessively without letting backtracking into the digit matching pattern, and the (?!-) negative lookahead check (making sure there is no - immediately after the current position) is performed once after \d++ matches all the digits it can. In case of \d+(?![-\d]), the 1+ digits are matched first, and then the negative lookahead makes sure there is no digit and - immediately to the right of the current location.
  • \b(\d+)x(\d+)\b(?!-) - matches a word boundary first, then matches and captures 1 or more digits into Group 1, then matches x, then matches and captures into Group 2 one or more digits, then asserts that there is a word boundary, and only then makes sure there is no - right after.

See a PHP demo:

if (preg_match('~(\d+)x(\d++)(?!-)~', "18x18", $m)) {
    echo "18x18: " . $m[1] . " - " . $m[2] . "\n";
}
if (preg_match('~\b(\d+)x(\d+)\b(?!-)~', "18x18", $m)) {
    echo "18x18: " . $m[1] . " - " . $m[2] . "\n";
}
if (preg_match('~(\d+)x(\d++)(?!-)~', "18x18-", $m)) {
    echo "18x18-: " . $m[1] . " - " . $m[2] . "\n";
}
if (preg_match('~\b(\d+)x(\d+)\b(?!-)~', "18x18-", $m)) {
    echo "18x18-: " . $m[1] . " - " . $m[2];
}

Output:

18x18: 18 - 18
18x18: 18 - 18
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • In addition, see [Greedy vs. Reluctant vs. Possessive Quantifiers](https://stackoverflow.com/questions/5319840/greedy-vs-reluctant-vs-possessive-quantifiers) for more info about the possessive quantifier (I'm leaving this here since I was confused myself) – Zenoo Jul 10 '18 at 11:57
  • @Zenoo I added an equivalent pattern that does not use a possessive quantifier. – Wiktor Stribiżew Jul 10 '18 at 12:02
  • I am astounded at how complex this turns out to be but it works perfectly. Something so simple should be easier to accomplish. I have never used possessive quantifiers or word boundaries before. I will need to study to understand their use. Thanks! – Todd Hammer Jul 10 '18 at 12:08
  • Dear Wiktor may I ask what is the difference between `\d++` and `\d+` here? or in general. – Anoushiravan R Sep 29 '21 at 20:16
  • 1
    @AnoushiravanR `\d++` is modified with a possessive quantifier, which forbids backtracking into the pattern (=no re-matching the input string is allowed, once the one or more digits is matched the first time, there regex index cannot go back through this matched text). So, in `\d++(?!x)` the negative lookahead is executed once after all digits are matched, and if there is no `x`, the whole match is failed, no match is returned. In case of `\d+(?!x)`, backtracking into `\d+` is allowed, and if there are two or more digits followed with `x`,the last one will be cut off and then match will occur. – Wiktor Stribiżew Sep 29 '21 at 21:43
  • Thanks for the thorough explanation dear Wiktor. – Anoushiravan R Sep 29 '21 at 23:40