How can I extract data from an HTML table in PHP?

Question

Possible Duplicate:
How to parse and process HTML with PHP?

Let's say I want to extract a certain number/text from a table from here: http://www.fifa.com/associations/association=chn/ranking/gender=m/index.html

I want to get the first number on the right table td under FIFA Ranking position. That would be 88 right now. Upon inspection, it is <td class="c">88</td>.

How would I use PHP to extract the info from said webpage?

edit: I am told JQuery/JavaScript it is for this... better suited

@MisterMelancholy you could do this in alot of languages. I would use a dom parser like http://simplehtmldom.sourceforge.net/ to get the information I needed. — shapeshifter, Dec 06 '12 at 05:03
Use a proper HTML parser. Don't use regexes. They are not up to the task. — Andy Lester, Dec 06 '12 at 05:22

score 1 · Accepted Answer · 2012-12-06T07:00:47.130

This could probably be prettier, but it'd go something like:

<?php
$page = file_get_contents("http://www.fifa.com/associations/association=chn/ranking/gender=m/index.html");
preg_match('/<td class="c">[0-9]*</td>/',$page,$matches);
foreach($matches as $match){
    echo str_replace(array( "/<td class=\"c\">", "</td>"), "", $match);
}
?>

I've never done anything like this before with PHP, so it may not work.

If you can work your magic after page load, you can use JavaScript/JQuery

<script type='text/javascript'>
var arr = [];

jQuery('table td.c').each(
    arr[] = jQuery(this).html();
);

return arr;
</script>

Also, sorry for deleting my comment. You weren't specific as to what needed to be done, so I initially though jQuery would better fit your needs, but then I thought "Maybe you want to get the page content before an HTML page is loaded".

score 0 · Answer 2 · answered Dec 06 '12 at 05:07

Try http://simplehtmldom.sourceforge.net/,

$html = file_get_html('http://www.google.com/');
echo $html->find('div.rankings', 0)->find('table', 0)->find('tr',0)->find('td.c',0)->plaintext;

This is untested, just looking at the source. I'm sure you could target it faster.

In fact,

echo $html->find('div.rankings', 0)->find('td.c',0)->plaintext;

should work.

score 0 · Answer 3 · answered Dec 06 '12 at 05:15

Using DOMDocument, which should be pre-loaded with your PHP installation:

$dom = new DOMDocument();
$dom->loadHTML(file_get_contents("http://www.example.com/file.html"));
$xpath = new DOMXPath($dom);
$cell = $xpath->query("//td[@class='c']")->item(0);
if( $cell) {
    $number = intval(trim($cell->textContent));
    // do stuff
}

How can I extract data from an HTML table in PHP?

3 Answers3

Linked