0

I'm using regex to pull info from a html table.

But I'm messing up some how, and have no idea why.

PHP CODE:

$printable = file_get_contents('./testplaylist.php', true);

if(preg_match_all('/<TR[^>]*>(.*?)<\/TR>/si', $printable, $matches, PREG_SET_ORDER)); {
foreach($matches as $match) {
$data = "$match[1]";

echo("$data <br />");

}
}

HTML DATA:

<TR class=" light ">
Stuff in here
</TR>

Any help would be appreciated,

Thanks!

  • *(related)* [Best Methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662) – Gordon Aug 18 '11 at 10:26
  • _(related)_ [Best methods to parse HTML with PHP](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html-with-php) – edorian Aug 18 '11 at 10:28
  • also, since the extension is `.php`. Is the HTML snippet you show actually in that file or will the PHP file generate it when executed. In the latter case, you have to `file_get_contents` from a webserver. – Gordon Aug 18 '11 at 10:32

4 Answers4

0

I know what your first problem is. regex! I kid! but have you checked out PHP DOM?

http://www.php.net/manual/en/domdocument.loadhtmlfile.php

It would probably work in your case just fine. It would be 10x easier too.

Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems. -Jamie Zawinski

Matt
  • 7,049
  • 7
  • 50
  • 77
0

Works fine here. It should work unless you have nested tables.

The problem must be in your data source. Do some tracing with var_dump.

Karoly Horvath
  • 94,607
  • 11
  • 117
  • 176
0

Use PHP's document object model to be safe when parsing HTML. Except for very simple regexes, HTML parsing rapidly gets out of control when you DIY. There's a bit of overhead to set it up, but once you get going it's straightforward.

See DOM for instructions on how to use it.

If you stick to the regex technique, at the least, you may need to escape all '<' and '>'s eg.

/\<TR[^>]*\>(.*?)\<\/TR\>/si
Pete855217
  • 1,570
  • 5
  • 23
  • 35
  • no, `<` `>` has no special meaning in regex. – Karoly Horvath Aug 18 '11 at 10:33
  • Yes avoid regexes like the plague. Then, if you *have* to use regexes at least escape the angled brackets, or better yet: quote litteral parts of your pattern. E.g.: `$pattern = '/\\Q*]\\>(.*?)\\Q\\E/si';` – user268396 Aug 18 '11 at 10:36
  • @yi_H: On the contrary, `<` and `>` do have special meaning in PHP interpretation of Perl regexes which is what the preg module provides. They allow you to name subpatterns: (?Psub_pattern). – user268396 Aug 18 '11 at 10:40
  • uhm.. no. `?'name'` is also a named subpattern. But you normally don't escape `'`, right? You only have to escape where it would be missinterpreted. – Karoly Horvath Aug 18 '11 at 10:50
0

Try this one instead

http://sandbox.phpcode.eu/g/bba70.php

if(preg_match_all('/<TR[^>]*>(.*?)<\/TR>/msU', $printable, $matches)) {  
     foreach($matches[1] as $match) {
          echo("$match <br />");   
      }
}
genesis
  • 50,477
  • 20
  • 96
  • 125