3

Just trying to preg_match the second match.

<?php
$url = "http://domain.com";
preg_match('~<table([^>]*)(class\\s*=\\s*["\']ladder-table["\'])([^>]*)>(.*?)</table>~i', file_get_contents($url), $match);
print $match[0];    
?>

Here is the table I'm trying to find:

<table class="ladder-table">Content</table>
<table class="ladder-table">Content</table> <-- [This one]
<table class="ladder-table">Content</table>

The last two tables are hidden by a java script code. Does it influence on the pattern?

RobbyBubble
  • 65
  • 1
  • 1
  • 11
  • preg_match only checks for a single match. You want preg_match_all :-) – cmbuckley Jan 29 '13 at 00:14
  • **Don't use regular expressions to parse HTML**. You cannot reliably parse HTML with regular expressions. As soon as the HTML changes from your expectations, your code will be broken. See http://htmlparsing.com/php.html for examples of how to properly parse HTML with PHP modules. – Andy Lester Jan 29 '13 at 00:17

1 Answers1

3

If you want to continue to use regular expressions, use preg_match_all:

$url = "http://domain.com";
preg_match_all('~<table([^>]*)(class\\s*=\\s*["\']ladder-table["\'])([^>]*)>(.*?)</table>~i', file_get_contents($url), $match);
print_r($match[0][1]);

This may be enough for your requirements. However, it's difficult to make your code robust enough to deal with changes to the HTML; for instance, the above wouldn't match if Content has any new lines, because you're checking for .*? without the PCRE_DOTALL modifier.

The correct way to handle this would be using a proper HTML parser such as DOM or others.

Community
  • 1
  • 1
cmbuckley
  • 40,217
  • 9
  • 77
  • 91