-6

What is the PHP equivalent for this Perl code?

my $html = '<tr class="aaa"><td class="bbb">111.111.111.111</td><td>443</td><td><div><span class="ccc"></span> example <span> example</span></div></td></tr><tr class="aaa"><td class="bbb">222.222.222.222</td><td>443</td><td><div><span class="ccc"></span> example <span> example</span></div></td></tr>';

print "$1:$2\n" while $html =~ /class="aaa"><td class="bbb">(.*?)<\/td><td>(\d+)<\/td>/g;

I tried with this code, but it gives infinite loop.

while(preg_match('/td class=\"bbb\">(.*?)<\/td><td>(\d+)<\/td>/',$html,$out)) {
        echo "$out[1]:$out[2]\n";
    }

Also, with if instead of while it gives only one result.

Expected output (IP:PORT):

111.111.111.111:443
222.222.222.222:443

Environment: Windows 7 with PHP 5.5.12 (WAMP v2.5).

tr0in
  • 67
  • 1
  • 6
  • 3
    What have you tried so far to get this going? That might help us understand the problems you're facing :) – Henders Jul 26 '16 at 09:50
  • I tried with this code, but it gives infinite loop: `while(preg_match('/td class=\"bbb\">(.*?)<\/td>(\d+)<\/td>/',$html,$out)) { echo "$out[1]:$out[2]\n"; }` – tr0in Jul 26 '16 at 09:55
  • Please [edit] your question and add the PHP code there. It's hard to read in the comment. – simbabque Jul 26 '16 at 09:57
  • Also, with **if** insted of **while** it gives only one result. – tr0in Jul 26 '16 at 09:57
  • You should look at [this question](http://stackoverflow.com/questions/4088836/php-preg-match-and-preg-match-all-functions) and [the answer](http://stackoverflow.com/a/19767391/2233391) given, I think that will resolve your issue. – Henders Jul 26 '16 at 10:01
  • @Henders that explains the difference, but not what OP is doing wrong. – simbabque Jul 26 '16 at 10:05
  • @simbabque agreed, but it does point them in the right direction based on their comment _'with if insted of while it gives only one result'_. Looks like they were on the right track initially and then diverted when only a single result was returned. – Henders Jul 26 '16 at 10:08
  • @Henders I agree. But there is more, because `preg_match_all` also seems to only return _if_ it matched. Coming from a Perl background, I would iterate the matches array [as shown in my answer](http://stackoverflow.com/a/38586866/1331451). That might not be the most efficient way, and simply saying `if(preg_match_all(...))` and then using `$matches` might suffice, but I don't have a PHP to test it. – simbabque Jul 26 '16 at 10:10
  • @simbabque I've just tested it and on PHP 5.5.9 you can do that no problem. As you suggested, `if(preg_match_all(...))` and then `print_r($matches)` gives you the array of matches. – Henders Jul 26 '16 at 10:22
  • You should add that as an answer @Henders. Mine explains the _why_ part, but my lack of PHP knowledge is clearly visible. ;) – simbabque Jul 26 '16 at 10:27
  • Can you provide an example of your expected output, please? When I [run your code on PHPFiddle](http://phpfiddle.org/main/code/in7p-rb4g) it gives me a result that doesn't look like what you are looking for... – Henders Jul 26 '16 at 10:37
  • @Henders: If there are no matches then the `$out` array will be empty, so simply printing its contents regardless of the return value of `preg_match_all` will do the right thing. – Borodin Jul 26 '16 at 10:44
  • @Borodin I was missing the `PREG_SET_ORDER`. Nice spot! – Henders Jul 26 '16 at 10:49
  • @Henders: That just reorders the contents of `$out`; all of the information is there with or without it. – Borodin Jul 26 '16 at 11:01
  • @Borodin Much more easy to access the data though! :) I'll remember that flag – Henders Jul 26 '16 at 11:04

2 Answers2

2

This code will do as you ask. It uses preg_match_all as simbabque described

<?php

$html = '<tr class="aaa"><td class="bbb">221.86.2.163</td><td>443</td><td><div><span class="ccc"></span> example <span> example</span></div></td></tr><tr class="aaa"><td class="bbb">221.86.2.163</td><td>443</td><td><div><span class="ccc"></span> example <span> example</span></div></td></tr>';

preg_match_all('|td class="bbb">([\d.]+)</td><td>(\d+)</td>|', $html, $out, PREG_SET_ORDER);

foreach ( $out as $item ) {
    echo "$item[1]:$item[2]\n";
}

?>

output

221.86.2.163:443
221.86.2.163:443
Community
  • 1
  • 1
Borodin
  • 126,100
  • 9
  • 70
  • 144
  • 1
    That's it, this solved my problem. PREG_SET_ORDER is the man, without it doesn't work! :D Thank you. – tr0in Jul 26 '16 at 10:46
  • From [the docs](http://php.net/manual/en/function.preg-match-all.php): _"PREG_SET_ORDER - Orders results so that $matches[0] is an array of first set of matches, $matches[1] is an array of second set of matches, and so on."_ – Henders Jul 26 '16 at 10:51
  • At first, I tried with PHP Simple HTML DOM Parser, I scraped the IPs, but I couldn't ports. – tr0in Jul 26 '16 at 10:56
  • *"without [PREG_SET_ORDER] it doesn't work"* Yes, it does work without that flag. It's simpler to unpack the contents of `$out` if you use it, but all the information is in there either way. – Borodin Jul 26 '16 at 11:00
  • @Borodin So, how to access the information without setting PREG_SET_ORDER flag? What is that "unpack" thing, how it works, how to do that? Is there any other way to do it without storing the information into the `$out` array? Something simple as the Perl code provided in the main question, something like `if you find a match - print it, if you find a match - print it, if you find a match - print it` etc. – tr0in Jul 26 '16 at 11:24
  • 1
    @Henders Thank you, I'm new here, I didn't know about 'accepted answer'. – tr0in Jul 26 '16 at 11:26
  • @tr0in: If you have something else to ask then you should open a new question. – Borodin Jul 26 '16 at 11:35
  • Haha, I should wait 5 days because I have reached the question limit, stackoverflow says. It's because someone downvote my question. – tr0in Jul 26 '16 at 11:44
  • @tr0in: It's becaused *six people* downvoted your question, and rightly so. You can't just post your work on Stack Overflow and wait for someone else to do it. You should spend those days reading through [***How do I ask a good question?***](http://stackoverflow.com/questions/how-to-ask) – Borodin Jul 26 '16 at 12:25
0

The PHP function preg_match() returns an integer that indicates if it matched. You are only looking at that return value in your loop, so that condition will always be true. That's why you have an infinite loop.

Since preg_match's $matches only gives you all capture groups from matching once, you only get the first match when used with an if.

The Perl code has the /g modifier on the regular expression match, which makes the match global. The match operator =~ returns a true value for each match. It's basically an iterator, so the while loop will go through all matches without repeating a match, so there is no infinite lop. Then the match variables $1 and $2 are used to display results. You need to use preg_match_all to get a global match in PHP.

You need to first match, then iterate the array with the matches. Since the first element is the full match, you can ignore that.

preg_match_all('/td class=\"bbb\">(.*?)<\/td><td>(\d+)<\/td>/',$html,$out);
for ($i = 1; $i < count($out) - 1; $i += 2) {
    echo "$out[$i]:";
    echo $out[$i+1];
    echo "\n";
}
simbabque
  • 53,749
  • 8
  • 73
  • 136
  • Nope. I changed echo `"$out\n";` to `echo "$match\n";` but `preg_match` gives only the first match with newline between IP and Port. `preg_match_all` gives error **Notice: Array to string conversation bla bla bla**. – tr0in Jul 26 '16 at 10:13
  • @tr0in see my update. I need to read phpdoc for syntax because I am not usually using php. The foreach approach was wrong, I didn't see that there are two matches in the regex at first. – simbabque Jul 26 '16 at 10:18
  • That regex is better as, for instance, `'td class="bbb">(.*?)(\d+)'` – Borodin Jul 26 '16 at 10:18
  • Which regex @Borodin? – simbabque Jul 26 '16 at 10:19
  • @simbabque I see your update, but its same. It gives error: **Notice: Array to string conversation bla bla bla.** for 3rd and 4th line from your code - `$out[$i]` and `$out[$i+1]`. – tr0in Jul 26 '16 at 10:24
  • As I said, I'm not a PHP guy. Treat this as pseudo-code and work with it. @Henders [gives good advice in their comment](http://stackoverflow.com/questions/38586517/perl-to-php-equivalent-extract-strings-with-regex?noredirect=1#comment64562734_38586517) on the question. I believe that approach is easier, though you still might need a loop if you don't know how many matches you will have. – simbabque Jul 26 '16 at 10:26
  • OK @simbabque, thanks anyway, I appreciate your effort. – tr0in Jul 26 '16 at 10:31
  • @simbabque: The regex in your solution. Just like in Perl, it's best to use different delimiters if the pattern itself contains slashes, and double quotes don't need escaping anyway. – Borodin Jul 26 '16 at 10:45
  • @borodin I just copied it from the question. You're right of course. – simbabque Jul 26 '16 at 10:56