58

Using the following simple code:

package test;

import java.io.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;

public class TestOutputKeys {
    public static void main(String[] args) throws TransformerException {

        // Instantiate transformer input
        Source xmlInput = new StreamSource(new StringReader(
                "<!-- Document comment --><aaa><bbb/><ccc/></aaa>"));
        StreamResult xmlOutput = new StreamResult(new StringWriter());

        // Configure transformer
        Transformer transformer = TransformerFactory.newInstance()
                .newTransformer(); // An identity transformer
        transformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "testing.dtd");
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.transform(xmlInput, xmlOutput);

        System.out.println(xmlOutput.getWriter().toString());
    }

}

I get the output:

<?xml version="1.0" encoding="UTF-8"?>
<!-- Document comment --><!DOCTYPE aaa SYSTEM "testing.dtd">

<aaa>
<bbb/>
<ccc/>
</aaa>

Question A: The doctype tag appears after the document comment. Is it possible to make it appear before the document comment?

Question B: How do I achieve indentation, using only the JavaSE 5.0 API? This question is essentially identical to How to pretty-print xml from java, however almost all answers in that question depend on external libraries. The only applicable answer (posted by a user named Lorenzo Boccaccia) which only uses java's api, is basically equal to the code posted above, but does not work for me (as shown in the output, i get no indentation).

I am guessing that you have to set the amount of spaces to use for indentation, as many of the answers with external libraries do, but I just cannot find where to specify that in the java api. Given the fact that the possibility to set an indentation property to "yes" exists in the java api, it must be possible to perform indentation somehow. I just can't figure out how.

Community
  • 1
  • 1
Alderath
  • 3,761
  • 1
  • 25
  • 43
  • 4
    Repeating the comment I made in http://stackoverflow.com/questions/139076/how-to-pretty-print-xml-from-java - you can now pretty print without external libraries. See http://xerces.apache.org/xerces2-j/faq-general.html#faq-6. Yes this is a Xerces FAQ but the answer covers standard JDK classes. The initial 1.5 implementation of these classes had many issues but everything works fine from 1.6 on. Copy the LSSerializer example in the FAQ, chop the "..." bit and add `writer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE);` after the `LSSerializer writer = ...` line. – George Hawkins May 04 '11 at 08:51
  • This code snippet is vulnerable to XML eXternal Entity Injection (XXE). See https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html#transformerfactory – fanbondi Feb 09 '21 at 14:58

4 Answers4

119

The missing part is the amount to indent. You can set the indentation and indent amount as follow:

transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
transformer.transform(xmlInput, xmlOutput);
dfa
  • 114,442
  • 31
  • 189
  • 228
Rich Seller
  • 83,208
  • 23
  • 172
  • 177
  • good to know, I think it failed because I had an old version of xalan, double checking – Rich Seller Aug 12 '09 at 08:19
  • 1
    This solution indents the resulting XML document, compiling without errors or warnings. – Dave Jarvis Aug 12 '09 at 08:21
  • Isn't this solution also kind of library dependent. jre5.0/jdk5.0 ships with Apache Xalan, am I correct? What if a user has changed the implementation of TransformerFactory to be used to some other implementation which also conforms to the javax.xml.transform api? This will fail then, won't it? That property seems to be Apache implementation dependent, imo. – Alderath Aug 12 '09 at 08:24
  • @Rich Seller. The error you got, did it say that the apache property could not be recognized? If so, what version of java are you using and what does TransformerFactory.newInstance().getClass().getName() return? – Alderath Aug 12 '09 at 08:27
  • 2
    As you say, it depends upon Xalan, but this is part of the jdk.As far as I know, there isn't an API level setting to set indentation, so if a user is using a different implementation, you'll need to add in switch processing to set the indentation for that implementation. But aren't you in control of the implementation used? – Rich Seller Aug 12 '09 at 08:32
  • 6
    My view upon what an api is seems to tell me that the api should consist of functions/methods to perform a specified task, and while using an api, there should be no need to directly adress underlying implementation. But then again, I am only a novice programmer and maybe things only work the way I think they should in a Utopian world. Still I think that the fact that OutputKeys.INDENT exists at the api level SHOULD mean that api level indentation is possible unless the api is flawed (or Apache's implementation is flawed, not interpreting the property as it should) – Alderath Aug 12 '09 at 08:45
  • Well... I guess I will have to abandon my imagination of Utopia and force xalan to be used as an implementation – Alderath Aug 12 '09 at 08:49
  • I think you're right actually, this should be part of the API, but for some reason it isn't. – Rich Seller Aug 12 '09 at 09:01
  • 6
    This is the way I've always done it, but here it didn't work, probably a different XML library. I did `factory.setAttribute("indent-number", 4);` and now it works. – Adrian Smith Oct 21 '10 at 13:27
  • transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2"); what can i use if im offline =) ? –  Oct 23 '12 at 12:06
  • thanks, it did the job well, but the root node (in my case ) is like this: ``. How can I make it go one line below? – Bugs Happen Jul 06 '15 at 20:08
5

A little util class as an example...

import org.apache.xml.serialize.XMLSerializer;

public class XmlUtil {

public static Document file2Document(File file) throws Exception {
    if (file == null || !file.exists()) {
        throw new IllegalArgumentException("File must exist![" + file == null ? "NULL"
                : ("Could not be found: " + file.getAbsolutePath()) + "]");
    }
    DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
    dbFactory.setNamespaceAware(true);
    return dbFactory.newDocumentBuilder().parse(new FileInputStream(file));
}

public static Document string2Document(String xml) throws Exception {
    InputSource src = new InputSource(new StringReader(xml));
    DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
    dbFactory.setNamespaceAware(true);
    return dbFactory.newDocumentBuilder().parse(src);
}

public static OutputFormat getPrettyPrintFormat() {
    OutputFormat format = new OutputFormat();
    format.setLineWidth(120);
    format.setIndenting(true);
    format.setIndent(2);
    format.setEncoding("UTF-8");
    return format;
}

public static String document2String(Document doc, OutputFormat format) throws Exception {
    StringWriter stringOut = new StringWriter();
    XMLSerializer serial = new XMLSerializer(stringOut, format);
    serial.serialize(doc);
    return stringOut.toString();
}

public static String document2String(Document doc) throws Exception {
    return XmlUtil.document2String(doc, XmlUtil.getPrettyPrintFormat());
}

public static void document2File(Document doc, File file) throws Exception {
    XmlUtil.document2String(doc, XmlUtil.getPrettyPrintFormat());
}

public static void document2File(Document doc, File file, OutputFormat format) throws Exception {
    XMLSerializer serializer = new XMLSerializer(new FileOutputStream(file), format);
    serializer.serialize(doc);
}
}

XMLserializer is provided by xercesImpl from the Apache Foundation. Here is the maven dependency:

<dependency>
    <groupId>xerces</groupId>
    <artifactId>xercesImpl</artifactId>
    <version>2.11.0</version>
</dependency>

You can find the dependency for your favourite build tool here: http://mvnrepository.com/artifact/xerces/xercesImpl/2.11.0.

JGlass
  • 1,427
  • 2
  • 12
  • 26
Rob
  • 59
  • 1
  • 2
  • Add the references to external libraries, please. This sample doesn't work with the JDK only. XMLSerializer belongs to org.apache.xml.serialize. – Aubin Aug 23 '15 at 09:12
1

You could probably prettify everything with an XSLT file. Google throws up a few results, but I can't comment on their correctness.

McDowell
  • 107,573
  • 31
  • 204
  • 267
  • I like this idea. I use XSLT a fair bit for this sort of thing (namespace maniuplation, whitespace control, etc). It's not efficient, but it's quite easy, and not parser-dependent. – skaffman Aug 12 '09 at 10:45
0

To make the output a valid XML document, NO. A valid XML document must start with a processing instruction. See the XML specification http://www.w3.org/TR/REC-xml/#sec-prolog-dtd for more details.

Oskar
  • 479
  • 4
  • 2
  • This answer is based on a misunderstanding of the question. The comment is allowed to be either before or after the doctype declaration. Ie. you can have either `xmlDeclaration comment doctypeDeclaration` or `xmlDeclaration doctypeDeclaration comment`. The question never spoke about putting anything before the xmlDeclaration. – Alderath Jan 20 '14 at 09:02