1

I want to parse a html string using php (Simple number matching).

<i>1002</i><i>999</i><i>344</i><i>663</i>

and I want the result as an array. eg: [1002,999,344,633,...] I tried like this :

<?php
    $html="<i>1002</i><i>999</i><i>344</i><i>663</i>";
    if(preg_match_all("/<i>[0-9]*<\/i>/",$html, $matches,PREG_SET_ORDER))
        foreach($matches as $match) {
            echo strip_tags($match[0])."<br/>";
        }
?>

and I got the exact output which I want.

1002
999
344
663

But when I try the same code by making a small change in regular expression I'm getting different answer.

Like this:

<?php
    $html="<i>1002</i><i>999</i><i>344</i><i>663</i>";
    if(preg_match_all("/<i>.*<\/i>/",$html, $matches,PREG_SET_ORDER))
        foreach($matches as $match) {
            echo strip_tags($match[0])."<br/>";
        }
?>

Output :

1002999344663

(The regular expression matched the entire string.)

Now I want to know why I'm getting like this? What is the difference if use .* (zero or more) instead of [0-9]* ?

gen_Eric
  • 223,194
  • 41
  • 299
  • 337
Vishal Vijay
  • 2,518
  • 2
  • 23
  • 45

1 Answers1

1

The .* in your regex matches any character ([0-9]* only matches numbers and </i><i> isn't a number). The regex /<i>.*<\/i>/ matches:

<i>1002</i><i>999</i><i>344</i><i>663</i>
^ from here ------------------- to here ^

Since, the whole string is inside <i></i>.

This is because * is greedy. It takes the max amount of characters it can match.

To fix your problem, you need to use .*?. This makes it takes the minimum amount of characters it can match.

The regex /<i>.*?<\/i>/ will work as you want.

gen_Eric
  • 223,194
  • 41
  • 299
  • 337