27

I'm using DOMDocument to generate a new XML file and I would like for the output of the file to be indented nicely so that it's easy to follow for a human reader.

For example, when DOMDocument outputs this data:

<?xml version="1.0"?>
<this attr="that"><foo>lkjalksjdlakjdlkasd</foo><foo>lkjlkasjlkajklajslk</foo></this>

I want the XML file to be:

<?xml version="1.0"?>
<this attr="that">
    <foo>lkjalksjdlakjdlkasd</foo>
    <foo>lkjlkasjlkajklajslk</foo>
</this>

I've been searching around looking for answers, and everything that I've found seems to say to try to control the white space this way:

$foo = new DOMDocument();
$foo->preserveWhiteSpace = false;
$foo->formatOutput = true;

But this does not seem to do anything. Perhaps this only works when reading XML? Keep in mind I'm trying to write new documents.

Is there anything built-in to DOMDocument to do this? Or a function that can accomplish this easily?

BenMorel
  • 34,448
  • 50
  • 182
  • 322
Josh Leitzel
  • 15,089
  • 13
  • 59
  • 76
  • 1
    I am not sure what the question is. The code you show will give the output you are asking for. Proof: http://codepad.org/4UGyRspx and http://codepad.org/bLTOFQrp - are you asking about the indentation level, e.g. the number of spaces used? – Gordon Mar 04 '12 at 18:20
  • There is a nice straightforward function (based on regular expressions) here: [Format XML with PHP](http://recurser.com/articles/2007/04/05/format-xml-with-php/) – Tomalak Apr 14 '09 at 06:08
  • Related as long as indentation is concerned: [Converting indentation with preg_replace (no callback)](http://stackoverflow.com/questions/8616594/converting-indentation-with-preg-replace-no-callback) – hakre Mar 10 '13 at 09:49

7 Answers7

36

DomDocument will do the trick, I personally spent couple of hours Googling and trying to figure this out and I noted that if you use

$xmlDoc = new DOMDocument ();
$xmlDoc->loadXML ( $xml );
$xmlDoc->preserveWhiteSpace = false;
$xmlDoc->formatOutput = true;
$xmlDoc->save($xml_file);

In that order, It just doesn't work but, if you use the same code but in this order:

$xmlDoc = new DOMDocument ();
$xmlDoc->preserveWhiteSpace = false;
$xmlDoc->formatOutput = true;
$xmlDoc->loadXML ( $xml );
$xmlDoc->save($archivoxml);

Works like a charm, hope this helps

BenMorel
  • 34,448
  • 50
  • 182
  • 322
Angel
  • 453
  • 4
  • 4
7

After some help from John and playing around with this on my own, it seems that even DOMDocument's inherent support for formatting didn't meet my needs. So, I decided to write my own indentation function.

This is a pretty crude function that I just threw together quickly, so if anyone has any optimization tips or anything to say about it in general, I'd be glad to hear it!

function indent($text)
{
    // Create new lines where necessary
    $find = array('>', '</', "\n\n");
    $replace = array(">\n", "\n</", "\n");
    $text = str_replace($find, $replace, $text);
    $text = trim($text); // for the \n that was added after the final tag

    $text_array = explode("\n", $text);
    $open_tags = 0;
    foreach ($text_array AS $key => $line)
    {
        if (($key == 0) || ($key == 1)) // The first line shouldn't affect the indentation
            $tabs = '';
        else
        {
            for ($i = 1; $i <= $open_tags; $i++)
                $tabs .= "\t";
        }

        if ($key != 0)
        {
            if ((strpos($line, '</') === false) && (strpos($line, '>') !== false))
                $open_tags++;
            else if ($open_tags > 0)
                $open_tags--;
        }

        $new_array[] = $tabs . $line;

        unset($tabs);
    }
    $indented_text = implode("\n", $new_array);

    return $indented_text;
}
Josh Leitzel
  • 15,089
  • 13
  • 59
  • 76
  • 2
    A quick remark: There is str_repeat() for the creating the tabs. The rest of the function seems quite okay to me. You could set up a small performance comparison to the one I've found. As an alternative idea, you can use strtok() to tokenize the input iteratively (instead of replace/explode). – Tomalak Apr 14 '09 at 06:29
  • Thanks! I actually like the function you found better than my own, as I've discovered it does the formatting badly the deeper you go. And I never knew about either str_repeat() or strtok(), so thanks for that as well! – Josh Leitzel Apr 20 '09 at 23:16
3

I have tried running the code below setting formatOutput and preserveWhiteSpace in different ways, and the only member that has any effect on the output is formatOutput. Can you run the script below and see if it works?

<?php
    echo "<pre>";
    $foo = new DOMDocument();
    //$foo->preserveWhiteSpace = false;
    $foo->formatOutput = true;
    $root = $foo->createElement("root");
    $root->setAttribute("attr", "that");
    $bar = $foo->createElement("bar", "some text in bar");
    $baz = $foo->createElement("baz", "some text in baz");
    $foo->appendChild($root);
    $root->appendChild($bar);
    $root->appendChild($baz);
    echo htmlspecialchars($foo->saveXML());
    echo "</pre>";
?>
John Rasch
  • 62,489
  • 19
  • 106
  • 139
  • Your code works fine, but it doesn't work for me with the way I've set it up. I have a class xml and inside that class I create a variable $this->xml which holds an instance of DOMDocument, and it doesn't seem to work with that setup. I would also prefer to have real tabs instead of just spaces. – Josh Leitzel Apr 14 '09 at 04:26
  • This seems like a special case then. I created a simple class with "xml" as a member, and it still worked. There are too many factors and without your exact code (or a simplified version that still fails for you) it's going to be impossible to reproduce. – John Rasch Apr 14 '09 at 04:56
  • Thanks for your help John. I've written a basic indentation function that will hopefully fix my problem (about to post it as an answer if you want to take a look). – Josh Leitzel Apr 14 '09 at 05:19
1

Which method do you call when printing the xml?

I use this:

$doc = new DOMDocument('1.0', 'utf-8');
$root = $doc->createElement('root');
$doc->appendChild($root);

(...)

$doc->formatOutput = true;
$doc->saveXML($root);

It works perfectly but prints out only the element, so you must print the <?xml ... ?> part manually..

Jindra
  • 11
  • 1
1

Most answers in this topic deal with xml text flow. Here is another approach using the dom functionalities to perform the indentation job. The loadXML() dom method imports indentation characters present in the xml source as text nodes. The idea is to remove such text nodes from the dom and then recreate correctly formatted ones (see comments in the code below for more details).

The xmlIndent() function is implemented as a method of the indentDomDocument class, which is inherited from domDocument. Below is a complete example of how to use it :

$dom = new indentDomDocument("1.0");
$xml = file_get_contents("books.xml");

$dom->loadXML($xml);
$dom->xmlIndent();
echo $dom->saveXML();

class indentDomDocument extends domDocument {
    public function xmlIndent() {
        // Retrieve all text nodes using XPath
        $x = new DOMXPath($this);
        $nodeList = $x->query("//text()");
        foreach($nodeList as $node) {
            // 1. "Trim" each text node by removing its leading and trailing spaces and newlines.
            $node->nodeValue = preg_replace("/^[\s\r\n]+/", "", $node->nodeValue);
            $node->nodeValue = preg_replace("/[\s\r\n]+$/", "", $node->nodeValue);
            // 2. Resulting text node may have become "empty" (zero length nodeValue) after trim. If so, remove it from the dom.
            if(strlen($node->nodeValue) == 0) $node->parentNode->removeChild($node);
        }
        // 3. Starting from root (documentElement), recursively indent each node. 
        $this->xmlIndentRecursive($this->documentElement, 0);
    } // end function xmlIndent

    private function xmlIndentRecursive($currentNode, $depth) {
        $indentCurrent = true;
        if(($currentNode->nodeType == XML_TEXT_NODE) && ($currentNode->parentNode->childNodes->length == 1)) {
            // A text node being the unique child of its parent will not be indented.
            // In this special case, we must tell the parent node not to indent its closing tag.
            $indentCurrent = false;
        }
        if($indentCurrent && $depth > 0) {
            // Indenting a node consists of inserting before it a new text node
            // containing a newline followed by a number of tabs corresponding
            // to the node depth.
            $textNode = $this->createTextNode("\n" . str_repeat("\t", $depth));
            $currentNode->parentNode->insertBefore($textNode, $currentNode);
        }
        if($currentNode->childNodes) {
            $indentClosingTag = false;
            foreach($currentNode->childNodes as $childNode) $indentClosingTag = $this->xmlIndentRecursive($childNode, $depth+1);
            if($indentClosingTag) {
                // If children have been indented, then the closing tag
                // of the current node must also be indented.
                $textNode = $this->createTextNode("\n" . str_repeat("\t", $depth));
                $currentNode->appendChild($textNode);
            }
        }
        return $indentCurrent;
    } // end function xmlIndentRecursive

} // end class indentDomDocument
frob59
  • 1
  • 2
-1

Yo peeps,

just found out that apparently, a root XML element may not contain text children. This is nonintuitive a. f. But apparently, this is the reason that, for instance,

$x = new \DOMDocument;
$x -> preserveWhiteSpace = false;
$x -> formatOutput = true;
$x -> loadXML('<root>a<b>c</b></root>');
echo $x -> saveXML();

will fail to indent.

https://bugs.php.net/bug.php?id=54972

So there you go, h. t. h. et c.

-3
header("Content-Type: text/xml");

$str = "";
$str .= "<customer>";
$str .= "<offer>";
$str .= "<opened></opened>";
$str .= "<redeemed></redeemed>";
$str .= "</offer>";
echo $str .= "</customer>";

If you are using any extension other than .xml then first set the header Content-Type header to the correct value.

uınbɐɥs
  • 7,236
  • 5
  • 26
  • 42