0

I want to get the values inside tds. I can do it by getElementsByTagName but I could not manage it by using getElementById.

The HTML might look like this:

<table id="myid">
<tr>
<td>value1</td>
<td>value2</td>
</tr>
<tr>
<td>value1</td>
<td>value2</td>
</tr>
</table>

The php used to access the values is:

<?PHP
$dom = new DOMDocument();
$dom->loadHTMLfile('http://remoteDomain/thispage.html');
$table=$dom->getElementById('myid');

    foreach($table->getElementsByTagName('tr') as  $key =>$tr){
       $tr->getElementsByTagName('td')->item(0)->nodeValue;
    }
?>

EDIT

I got the error: Fatal error: Call to a member function getElementsByTagName() on a non-object in ...

EDIT2

Php info:

DOM/XML enabled
DOM/XML API Version 20031129
libxml Version 2.7.3

Operating system: Windows

mustafa
  • 747
  • 3
  • 9
  • 24
  • you forgot to `echo` the nodeValue – Gordon Mar 08 '12 at 10:54
  • In case you get "Call to a member function getElementsByTagName() blah" make sure to have a recent version of libxml. You can either install those manually or by upgrading PHP to a more recent version. – Gordon Mar 08 '12 at 10:58
  • exact duplicate of http://stackoverflow.com/questions/9605362/is-it-possible-to-achieve-same-using-getelementbyid – Gordon Mar 08 '12 at 11:11

4 Answers4

2

The problem is that getElementById needs a DOCTYPE. If you add

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

at the beginning of the file, it should work (you should also add html and body tags).

Edit: Also you need to put $dom->validateOnParse = true; before you load the HTML file.

This is apparently a "feature" of the DomDocument class, see http://php.net/manual/en/domdocument.getelementbyid.php (comments)

Gordon
  • 312,688
  • 75
  • 539
  • 559
  • 1
    `loadHTMLFile` will add any missing HTML skeleton and treat the HTML as HTML 4 Transitional. – Gordon Mar 08 '12 at 10:52
  • validateOnParse will most of the time not help. Upgrading PHP or rather libxml will. – Gordon Mar 08 '12 at 11:01
  • That's what the PHP doc says though. I tried it with `validateOnParse` turned off and it didn't work then (PHP 5.3.6; Ubuntu). –  Mar 08 '12 at 11:04
  • see http://codepad.viper-7.com/V9BFhj. no validateOnParse but recent libxml. then compare with http://codepad.org/s3r6mggB – Gordon Mar 08 '12 at 11:14
  • 1
    Yea it's not that I don't belive you, I just wanted to point out that it worked for me :) –  Mar 08 '12 at 11:15
  • If I have created my document by starting with a `new DOMDocument()` and composing it from there, how do I add the DOCTYPE tag? – Jay Bienvenu Jan 08 '18 at 17:53
1

you need quotes on your id attribute

<table id="myid">
reach4thelasers
  • 26,181
  • 22
  • 92
  • 123
  • The original table has quotes around id – mustafa Mar 08 '12 at 10:51
  • 1
    Uh well your question didn't show that until you changed it - but thanks for the downvote - last time I answer one of your questions. – reach4thelasers Mar 08 '12 at 10:57
  • 2
    he doesnt need quotes around the id attribute as long as he loads the html with loadHTML or loadHTMLFile. libxml will use the html parser module then and autocorrect that. the answer is just wrong. – Gordon Mar 08 '12 at 11:02
  • He stated that the HTML which was outputted was id=myid - that's not valid HTML and the ID attribute would be unusable. – reach4thelasers Mar 08 '12 at 11:14
  • it would be invalid but not unusable. libxml's html parser can handle invalid html and will autocorrect it. see http://codepad.viper-7.com/V9BFhj. The issue here is the outdated libxml version. Adding quotes around the id will do nothing there. See http://codepad.org/7cIFjwrw – Gordon Mar 08 '12 at 11:16
0

getElementById as the name suggests, will only work if the element you're targeting actually has an id.

MrCode
  • 63,975
  • 10
  • 90
  • 112
0

As I understand, you want to use in foreach getElementById, and your code is working.

As it could be seen there is difference in the name getElement[S]ByTagName and getElementById, with purpose.

The definition of id is so to say to be unique identifier of element in a page, so only one id (unique) value is assumed to persist on single page and this value cannot be assigned to another element id. If you have more than one element with the same id value, it is wrong (since HTML 4 or XHTML 1.0, I think).

The way you use, looks valid.

EDIT

In this case you may have an anwser here: Same problem already solved

Community
  • 1
  • 1
Rolice
  • 3,063
  • 2
  • 24
  • 32
  • @mustafa "my code is not working" is not helpful. Not at all. Explain your problems if you want help. – Gordon Mar 08 '12 at 11:00
  • @mustafa, your problem has been met before here in Stack Overflow, see my edited answer above. – Rolice Mar 08 '12 at 11:34