-1

We have a logging system where we log payload on-demand to troubleshoot and in non-prod. However, due to column size constraint, we truncate the XML if it is more than 5000 characters. The XML is not pretty-print formatted and is a continuous string.

When the XML is truncated, it is hard to format it to make it easy to check the data in the XML. Usually, I use Java DocumentBuilderFactory to format a complete XML, but that fails if we use against a incomplete XML.

I would like to have a solution that can format an incomplete XML instead of throwing an error.

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
AKPuvvada
  • 47
  • 2
  • 10
  • 2
    Welcome to Stack Overflow. Please take the [tour] to learn how Stack Overflow works and read [ask] on how to improve the quality of your question. Then check the [help/on-topic] to see what questions you can ask. Please show your attempts you have tried and the problems/error messages you get from your attempts. You might also check similar questions like https://stackoverflow.com/questions/68558409/how-to-beautify-incomplete-xml-documents – Progman Jul 30 '21 at 17:50
  • 1
    See https://stackoverflow.com/questions/68558409/how-to-beautify-incomplete-xml-documents/68560000#68560000, it has some ideas on that. – Martin Honnen Jul 30 '21 at 20:34
  • Thanks, Martin Honnen. I tried to adapt it to Java. But did not work so far but it gave me some pointers. – AKPuvvada Jul 31 '21 at 04:15
  • The Brute-Process code provided at https://stackoverflow.com/a/19236572/3107741 works even better. – AKPuvvada Aug 03 '21 at 11:32

1 Answers1

1

Following the approach Michael Kay had outlined in his answer I linked to in a comment to use an identity Transformer with indentation over a StreamSource to catcn any parse exception the code looks like

   String xml = "<root><section><p>Paragraph 1.</p><p>Paragraph 2."; //"<root><section><p>Paragraph 1.</p><p>Paragraph 2.</p></section></root>";

    Transformer identityTransformer = TransformerFactory.newInstance().newTransformer();

    identityTransformer.setOutputProperty("indent", "yes");

    StringWriter resultWriter = new StringWriter();

    StreamResult resultStream = new StreamResult(resultWriter);

    try {
        identityTransformer.transform(new StreamSource(new StringReader(xml)), resultStream);
    }
    catch (TransformerException e) {
        System.out.println(e.getMessageAndLocation());
        System.out.println(resultWriter.toString());
    }

and then at least, for that example, gets to the last p element:

<?xml version="1.0" encoding="UTF-8"?>
<root>
   <section>
      <p>Paragraph 1.</p>
      <p

So some information at the end is lost but before that incomplete element the code at least breaks up the long one liner of the input into several lines.

Note: I used Saxon 10 HE as the default Transformer, if you use the JRE's one or Xalan you will need to set identityTransformer.setOutputProperty("{http://xml.apache.org/xalan}indent-amount", "2"); as otherwise you get line breaks but no indentation.

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110