0

I've extracted the categories you see on the left of this page by creating and exploring the DOM tree of the page. Now I want to create a new DOM to store it on my server and reload it locally and speed-up the whole process. I decided to do that while exploring the original DOM. The exploration of the original DOM works, so assume that the parameters are correct.

I write this code to create the DOM:

$curr_lev=1;
$mydom=new DOMdocument();
$curr_parent=$mydom->createElement('products');
function create_dom($name, $link, $lev){
    global $curr_lev;
    global $curr_parent;
    global $mydom;
    switch ($lev){
        case $curr_lev:
            $curr_parent->appendChild($mydom->createElement($name, $link));
            break;
        case $curr_lev-1:
            $curr_parent=$curr_padre->parentNode;
            $curr_parent->appendchild($mydom->createElemnt($name, $link));
            break;
        case $curr_lev+1:
            $curr_parent=$curr_padre->lastChild;
            $curr_parent->appendchild($mydom->createElement($name, $link));
            break;   
    }
    $curr_lev=$lev;
}

$mydom->formatOutput=TRUE;
$mydom->saveHTMLFile("products.xml");

i try to give an explanation: create_dom() it's called for each node of the original DOM. $lev indicates the level of the new node, $curr_lev it's the level of the last added node, so if they are equal last node added and the current node are child of the same father, if $lev < $curr_lev we have to go back of one level and the new added node is "brother" of the father of the last added, if $lev > $curr_lev the current node is child of the last node added.

The first problem is that when I execute I get this error:

Fatal error: Uncaught exception 'DOMException' with message 'Invalid Character Error' in C:\Users\Jacopo\Dropbox\Tirocinio\xampp-portable\htdocs\sites\prova\cerca categorie.php:71
Stack trace:
#0 C:\Users\Jacopo\Dropbox\Tirocinio\xampp-portable\htdocs\sites\prova\cerca categorie.php(71): DOMDocument->createElement('/joomla/compone...', 'Arduino')
#1 C:\Users\Jacopo\Dropbox\Tirocinio\xampp-portable\htdocs\sites\prova\cerca categorie.php(30): create_dom('Arduino', '/joomla/compone...', 1)
#2 C:\Users\Jacopo\Dropbox\Tirocinio\xampp-portable\htdocs\sites\prova\cerca categorie.php(38): visita_raff(Object(DOMElement), 1, 'dl')
#3 C:\Users\Jacopo\Dropbox\Tirocinio\xampp-portable\htdocs\sites\prova\cerca categorie.php(96): visita_raff(Object(DOMElement), 0, '')
#4 C:\Users\Jacopo\Dropbox\Tirocinio\xampp-portable\htdocs\sites\prova\index.php(21): include('C:\Users\Jacopo...') #5 {main} thrown in C:\Users\Jacopo\Dropbox\Tirocinio\xampp-portable\htdocs\sites\prova\cerca categorie.php on line 71

$name usually look like "arduino kit" and $link is like "/joomla/componenent/virtuamart/..."

I've tried converting it to UTF-8 but it wont work

Also I've tried to do a test and write this code:

function create_xml(){
    $mydom=new DOMdocument("1.0", "ISO-8859-1");
    $primoElem=$mydom->createElement('foo');
    $primoElem->appendChild($mydom->createElement('arduinio', 'http:arduino'));
    $mydom->formatOutput=TRUE;
    return $mydom->saveXML("foo.xml");
}

I get no error saveXML() returns 1, but nothing is written to the file!

What am I doing wrong? Please consider that is the first time I work with those things so be gentle :)

Cœur
  • 37,241
  • 25
  • 195
  • 267
jack_the_beast
  • 1,838
  • 4
  • 34
  • 67

1 Answers1

1

The Exception DOMException with the message

Invalid Character Error

means that you have tried to create an element (DOMDocument::createElement()) containing invalid characters in the element name:

$mydom->createElement($name, $link)
                        ^
                        |
           first parameter is the element name

In XML not every name is valid, some even contain invalid characters (for example a space " " or the backslash /) or invalid byte-sequences that aren't anything from the Unicode UTF-8 range. DOMDocument in PHP accepts UTF-8 as input only. So for for general. If you want to learn in depth which characters are valid in XML element names you can find more information that you will likely ever need in your live in How to check if string is a valid XML element name?.

So for now if you look closely to the stacktrace of the error message you can probably even spot the problem:

DOMDocument->createElement('/joomla/compone...', 'Arduino') 
                            ^      ^

The / character is not valid inside an XML element name. Fix the issue and you should be able to just add your stuff. Just use an element name that is valid in the end.

Community
  • 1
  • 1
hakre
  • 193,403
  • 52
  • 435
  • 836
  • Also, do not use the second argument to `createElement` as it also makes it easy to produce invalid XML. Use `createTextNode()` and append that node to the element. – Francis Avila Mar 06 '13 at 13:08
  • @FrancisAvila: What problem do you mean? Those two should be interchange-able and I didn't spot any difference between the two so far. But you seem to know more, please share. – hakre Mar 06 '13 at 13:14
  • Try `createElement('root','&')`--you will get an error and no text will be added to `` because the second argument is parsed for entities and entities are not escaped. Thus it's not exactly equivalent to `$d->createElement('root')->appendChild($d->createTextNode('&'))` and is a fertile source of bugs for people who think it *is* equivalent. (See also: `createElement('r','&')` === `&`, *not* `&amp;`!) – Francis Avila Mar 06 '13 at 13:28
  • @hakre i get your point, but if i try to do that: `$curr_parent->appendChild($mydom->createElement('item', $link));` item should be UTF-8, and as far as i now there's no limitation for the charater set for the _value_ of the element right? but i continue to get errors that seems to be generated from other problems: `Warning: DOMNode::appendChild(): Couldn't fetch DOMElement in C:\Users\Jacopo\Dropbox\Tirocinio\xampp-portable\htdocs\sites\prova\cerca categorie.php on line 79` continue.... – jack_the_beast Mar 06 '13 at 16:37
  • `Warning: Couldn't fetch DOMElement. Node no longer exists in C:\Users\Jacopo\Dropbox\Tirocinio\xampp-portable\htdocs\sites\prova\cerca categorie.php on line 78 Notice: Undefined property: DOMElement::$lastChild in C:\Users\Jacopo\Dropbox\Tirocinio\xampp-portable\htdocs\sites\prova\cerca categorie.php on line 78 Fatal error: Call to a member function appendchild() on a non-object in C:\Users\Jacopo\Dropbox\Tirocinio\xampp-portable\htdocs\sites\prova\cerca categorie.php on line 79` what that mean? – jack_the_beast Mar 06 '13 at 16:40
  • @JacopoGrassi: DOMDocument always expects UTF-8 encoded strings for input. So as far as the *value* parameter is concerned, yes, it needs to be UTF-8 encoded (otherwise when you serialize the XML in text-form, you will see warnings. – hakre Mar 06 '13 at 17:03
  • @JacopoGrassi: *Node no longer exists* means that you create some node some time ago, then you passed around some variables and in between the node got lost. So you still have the variable but what it once represented does not exist any longer. E.g. the document has changed in between and invalidated it. It basically means you need to take a little more care, probably the global variables you use make you run into that. But that is just guessed. – hakre Mar 06 '13 at 17:11
  • @ Francis Avila: Didn't thought about &s, good point. I was only testing with `<` and `>` which just work. – hakre Mar 06 '13 at 17:12
  • @hakre thanks, i will try to figure out what it's wrong in my code. can you tell me two last things? 1) it's correct adding attributes like this: `$node->setAttribute('name', $name);` or there's still an encoding problem? 2)the following code write nothing but " " to file, why? `function create_xml(){ $mydom=new DOMdocument(); $firstElem=$mydom->createElement('foo'); $firstElem->appendChild($mydom->createElement('arduinio', 'kit')); $mydom->formatOutput=TRUE; return $mydom->save("foo.xml"); }` – jack_the_beast Mar 06 '13 at 22:08