3

I need to extract a predefined set of hashtags from a blob of text, then extract what number follows right after it if any. Eg. I'd need to extract 30 from "Test string with #other30 hashtag". I assumed preg_match_all would be the right choice.

Some test code:

$hashtag = '#other';
$string  = 'Test string with #other30 hashtag';
$matches = [];
preg_match_all('/' . $hashtag . '\d*/', $string, $matches);
print_r($matches);

Output:

Array
(
    [0] => Array
        (
            [0] => #other30
        )
)

Perfect... Works as expected. Now to extract the number:

$string = $matches[0][0]; // #other30
$matches = [];
preg_match_all('/\d*/', $string, $matches);
print_r($matches);

Output:

Array
(
    [0] => Array
        (
            [0] =>
            [1] =>
            [2] =>
            [3] =>
            [4] =>
            [5] =>
            [6] => 30
            [7] =>
        )
)

What? Looks like it's trying to match every character?

I'm aware of some preg_match_all related answers (one, two), but they all use a parenthesized subpattern. According to documentation - it is optional.

What am I missing? How do I simply get all matches into an array that match such a basic regex like /\d*/ There doesn't seem to be a more appropriate function in php for that.

I never thought I'd be scratching my head with such a basic thing in PHP. Much appreciated.

Community
  • 1
  • 1
Vigintas Labakojis
  • 1,039
  • 1
  • 15
  • 21

4 Answers4

2

You need to replace:

preg_match_all('/\d*/', $string, $matches);

with:

preg_match_all('/\d+/', $string, $matches);

Replace * with +

Because

* Match zero or more times.

+ Match one or more times.

Community
  • 1
  • 1
Muhammad Bilal
  • 2,106
  • 1
  • 15
  • 24
1

You can use a capturing group:

preg_match_all('/' . $hashtag . '(\d*)/', $string, $matches); 
echo $matches[1][0] . "\n";
//=> 30

Here (\d*) will capture the number after $hashtag.

anubhava
  • 761,203
  • 64
  • 569
  • 643
1

Also see, that you can reset after a certain point to get part of a match by using \K. And of course need to use \d+ instead of \d* to match one or more digits. Else there would be matches in gaps in between the characters where zero or more digits matches.

enter image description here

So your code can be reduced to

$hashtag = '#other';
$string  = 'Test string with #other30 #other31 hashtag';
preg_match_all('/' . $hashtag . '\K\d+/', $string, $matches);
print_r($matches[0]);

See the demo at eval.in and consider using preg_quote for $hashtag.

Community
  • 1
  • 1
bobble bubble
  • 16,888
  • 3
  • 27
  • 46
0

PHP Fiddle

<?php

    $hashtag = '#other';
    $string  = 'Test string with #other30 hashtag';
    $matches = [];
    preg_match_all('/' . $hashtag . '\d*/', $string, $matches);
    $string = preg_match_all('#\d+#', $matches[0][0], $m);
    echo $m[0][0];

?>
Mi-Creativity
  • 9,554
  • 10
  • 38
  • 47