1

I am trying to extract all strings that look like 12-15 from a parent string. This means all strings that have a dash in between two digits.

Using this answer as a basis, I tried the following:

<?php

$str = "34,56,67-90,45";
preg_match('/^(\d-\d)|(,\d-\d)|(\d-\d,)|(,\d-\d,)$/', $str, $output, PREG_OFFSET_CAPTURE);
echo print_r($output);

?>

This looks for any substring that looks a dash enclosed between digits, whether it has a comma before, after, or both, or none. When I run the PHP code, I get an empty array. On Regex101, when I test the regular expression, strings like 4-5,,,,, seem to, and I'm not understanding why it's letting me add extra commas.

What's wrong with my regex that I get an empty array?

Community
  • 1
  • 1
Honinbo Shusaku
  • 1,411
  • 2
  • 27
  • 45

3 Answers3

4

I think you could use a simple regex like this

\d+[-]\d+

That is (match at least 1 digit) (match a literal dash) (match at least 1 digit)

CollinD
  • 7,304
  • 2
  • 22
  • 45
2

\d matches a single digit. All the numbers in your sample string have two digits. You should use \d+ to match any number of digits.

preg_match('/^(\d+-\d+)|(,\d+-\d+)|(\d+-\d+,)|(,\d+-\d+,)$/', $str, $output, PREG_OFFSET_CAPTURE);

Output:

Array
(
    [0] => Array
        (
            [0] => ,67-90
            [1] => 5
        )

    [1] => Array
        (
            [0] => 
            [1] => -1
        )

    [2] => Array
        (
            [0] => ,67-90
            [1] => 5
        )

)

You can also simplify the regexp:

preg_match('/(?:^|,)\d+-\d+(?:,|$)/', $str, $output, PREG_OFFSET_CAPTURE);

Output:

Array
(
    [0] => Array
        (
            [0] => ,67-90,
            [1] => 5
        )

)
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • How would I extract ALL instances of the pattern into the array? – Honinbo Shusaku Sep 04 '15 at 02:36
  • Use `preg_match_all` instead of `preg_match`. – Barmar Sep 04 '15 at 02:39
  • The flags for `preg_match_all` are a bit confusing. I just wanted an array where I can iterate through with the matches found in the parent string. It seems as though, if I use the flag `PREG_PATTERN_ORDER` and do `preg_match_all($str, $output, PREG_PATTERN_ORDER);`, `$output[0]` is the array I'm looking for. I've done a few tests, and they seem to be consistent. To make sure, am I correct about this? – Honinbo Shusaku Sep 04 '15 at 02:52
  • Yes, that's correct. And I agree, the options are a bit confusing, I can never remember which way is the default. – Barmar Sep 04 '15 at 02:53
  • Your regex fails when there are consecutive pairs of numbers `34,56,67-90,34-53,24-23` https://regex101.com/r/kO2iD7/1 – nhahtdh Sep 04 '15 at 03:53
1

The | has precedence, meaning your expression is interpreted as "MATCH EITHER ONE OF THE FOLLOWING:

  1. START of text -> 1 digit -> dash -> 1 digit (not matching end of text)
  2. Comma (may be in the middle of the text, anywhere) -> 1 digit -> dash -> 1 digit
  3. 1 digit (anywhere) -> dash -> 1 digit -> comma
  4. comma (anywhere) -> 1 digit -> dash -> 1 digit -> comma -> END of text

Also, your are using \d which matches 1 digit (only one character). You can use \d{2} to match 2 digits (00 to 99), or \d+ to match any integer (1, 55, 123456, etc).


In your case, I think you're trying to use this expression:

/(?:^|,)(\d+-\d+)(?=,|$)/

which means: START of text OR comma -> any integer -> dash -> any integer -> followed by (but not consuming inmatch) a comma OR END of text

Mariano
  • 6,423
  • 4
  • 31
  • 47
  • 1
    Your regex fails when there are consecutive pairs of numbers `34,56,67-90,34-53,24-23` – nhahtdh Sep 04 '15 at 03:54
  • True. I was mislead by the use of preg_match instead of preg_match_all. EDITed in answer with a lookahead. Thanks – Mariano Sep 04 '15 at 04:40