$doc->getElementById('id'), $doc->getElementsByName('id') not working

Question

Possible Duplicate:
PHP HTML DomDocument getElementById problems

I'm trying to extract info from Google searches in PHP and find that I can read the search urls without problem, but getting anything out of them is a whole different issue. After reading numerous posts, and applicable PHP docs, I came up with the following

// get large panoramas of montana
$url = 'http://www.google.com/search?q=montana+panorama&tbm=isch&biw=1408&bih=409';
$html = file_get_contents($url);
// was getting tons of "entity parse" errors, so added
$html = htmlentities($html, ENT_COMPAT, 'UTF-8', true); // tried false as well

$doc = new DOMDocument();
//$doc->strictErrorChecking = false; // tried both true and false here, same result
$result = $doc->loadHTML($html);

//echo $doc->saveHTML(); this shows that the tags I'm looking for are in fact in $doc

if ($result === true)
{
    var_dump($result); // prints 'true'
    $tags = $doc->getElementById('center_col');
    $tags = $doc->getElementsByTagName('td');
    var_dump($tags); // previous 2 lines both print NULL
}

I've verified that the ids and tags I'm looking for are in the html by error_log($html) and in the parsed doc with $doc->SaveHTNL(). Anyone see what I'm doing wrong?

Edit:

Thanks all for the help, but I've hit a wall with DOMDocument. Nothing in any of the docs, or other threads, works with Google image queries. Here's what I tried:

I looked at the @Jon link tried all the suggestions there, looked at the getElementByID docs and read all the comments there as well. Still getting empty result sets. Better than NULL, but not much.

I tried the xpath trick:

$xpath  = new DOMXPath($doc);
$ccol   = $xpath->query("//*[@id='center_col']");

Same result, an empty set.

I did a error_log($html) directly after the file read and the document has a doctype "" so it's not that.

I also see there that user "carl2088" says "From my experience, getElementById seem to work fine without any setups if you have loaded a HTML document". Not in the case of Google image queries, it would appear.

In desperation, I tried

echo count(explode('center_col', $html))

to see if for some strange reason it disappears after the initial error_log($html). It's definitely there, the string is split into 4 chunks.

I checked my version of PHP (5.3.15) complied Aug. 25 2012, so it's not a version too old to support getElementByID.

Before yesterday, I had been using an extremely ugly series of "explodes" to get the info, and while it's horrid code, it took 45 minutes to write and it works.

I'd really like to ditch my "explode" hack, but 5 hours to achieve nothing vs 45 minutes to get something that works, makes it really difficult to do things the right way.

If anyone else with experience using DOMDocument has some additional tricks I could try, it would be much appreciated.

You are overwriting the first output of `$tags`. var_dump that separately — Pekka, Oct 18 '12 at 11:50
I'm not really, I just included both to show how each was attempted. — user1755989, Oct 18 '12 at 11:51

score 0 · Answer 1 · answered Oct 18 '12 at 11:43

0

are you using the the javascript getElementById and getElementsByTagName if yes than this is the problem

 $tags = $doc->getElementById('center_col');
    $tags = $doc->getElementsByTagName('td');

answered Oct 18 '12 at 11:43

NullPoiиteя

56,591
22
125
143

1

What does this mean? These are PHP functions... – lonesomeday Oct 18 '12 at 11:44
This is a PHP question, not Javascript. – user1755989 Oct 18 '12 at 12:14

score 0 · Answer 2 · edited May 23 '17 at 11:48

0

You will need to validate your document with DOMDocument->validate() or DOMDocument->validateOnParse before using function $doc->getElementById('center_col');

$doc->validateOnParse = true;
$doc->loadHTML($html);

stackoverflow: getelementbyid-problem

http://php.net/manual/de/domdocument.getelementbyid.php

it's in the question @Jon post in his comment!

edited May 23 '17 at 11:48

Community

1
1

answered Oct 18 '12 at 12:41

moskito-x

11,832
5
47
60

$doc->getElementById('id'), $doc->getElementsByName('id') not working

2 Answers2