0

I'm working on a personal project where it gets the content of my local weather station's school/business closing and it displays the results on my personal site. Since the site doesn't use an RSS feed (sadly), I was thinking of using a PHP scrape to get the contents of the page, but I only want to show a certain ID element. Is this possible?

My PHP code is,

<?php
$url = 'http://website.com';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);
echo $output;
?>

I was thinking of using preg_match, but I'm not sure of the syntax or if that's even the right command. The ID element I want to show is #LeftColumnContent_closings_dg.

jprofitt
  • 10,874
  • 4
  • 36
  • 46
Charlie
  • 11,380
  • 19
  • 83
  • 138
  • 2
    DOM parsing is generally accepted as the preferred way to parse HTML/XML content over regexes. You'll want to employ PHP's DOMDocument with an xpath query to pull out the specific bits of information you're looking for. –  Jan 02 '12 at 19:56
  • @MarcB OMG that's a motherload of upvotes. Canonical is the right word ... –  Jan 02 '12 at 20:02
  • @rdlowrey: it's gotten to the point that any question on SO that involves html+regex should just get auto-closed and pointed at that answer. – Marc B Jan 02 '12 at 20:03
  • @MarcB That's the best answer I've seen on SO :D – Charlie Jan 02 '12 at 20:15

2 Answers2

2

Here's an example using DOMDocument. It pulls the text from the first <h1> element with the id="test" ...

$html = '
<html>
<body>
<h1 id="test">test element text</h1>
<h1>test two</h1>
</body>
</html>
';

$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$res = $xpath->query('//h1[@id="test"]');
if ($res->item(0) !== NULL) {
  $test = $res->item(0)->nodeValue;
}
1

A library I've used with great success for this sort of things is PHPQuery: http://code.google.com/p/phpquery/ .

You basically get your website into a string (like you have above), then do:

phpQuery::newDocument($output);

$titleElement = pq('title');
$title = $titleElement->html();

For instance - that would get the contents of the title element. The benefit is that all the methods are named after the jQuery ones, making it pretty easy to learn if you already know jQuery.

Rich Bradshaw
  • 71,795
  • 44
  • 182
  • 241