1

I'm attempting to make a script that only echos the div that encolose the image on google.

$url = "http://www.google.com/";
$page = file($url);

foreach($page as $theArray) {
echo $theArray;
}

The problem is this echos the whole page. I want to echo only the part between the <div id="lga"> and the next closest </div> Note: I have tried using if's but it wasn't working so I deleted them

Thanks

Matti Virkkunen
  • 63,558
  • 9
  • 127
  • 159
User1
  • 71
  • 7

2 Answers2

2

In order to do this you need to parse the DOM and then get the ID you are looking for. Check out a parsing library like this http://simplehtmldom.sourceforge.net/manual.htm

After feeding your html document into the parser you could call something like:

$html = str_get_html($page); 
$element = $html->find('div[id=lga]'); 
echo $element->plaintext;

That, I think, would be your quickest and easiest solution.

dmcnelis
  • 2,913
  • 1
  • 19
  • 28
2

Use the built-in DOM methods:

<?php

$page = file_get_contents("http://www.google.com");

$domd = new DOMDocument();
libxml_use_internal_errors(true);
$domd->loadHTML($page);
libxml_use_internal_errors(false);

$domx = new DOMXPath($domd);
$lga = $domx->query("//*[@id='lga']")->item(0);

$domd2 = new DOMDocument();
$domd2->appendChild($domd2->importNode($lga, true));

echo $domd2->saveHTML();
Maerlyn
  • 33,687
  • 18
  • 94
  • 85
  • +1, but note that this may not be friendly if the text isn't properly parse-able – zanlok Feb 03 '11 at 01:08
  • PHP's DOM library is pretty powerful when it comes to handling invalid html, I only met a few problematic pages so far, but they all were FUBAR. – Maerlyn Feb 03 '11 at 01:15