12

I am parsing and outputting an xml file using JSoup (and modifying the elements in between of course).

The output file has some extra spaces and line breaks. I was wondering if I can print this in the original format.

Original:

  <attributes>
        <divisions>4</divisions>
        <key>
          <fifths>0</fifths>
          <mode>major</mode>
          </key>
...

New:

<attributes> 
    <divisions>
     4
    </divisions> 
    <key> 
     <fifths>
      0
     </fifths> 
     <mode>
      major
     </mode> 
    </key> 
...

Any idea on how to remove the spaces/enters from the elements?

I currently read in and print the document like this:

doc = Jsoup.parse(is, "UTF-8", "", Parser.xmlParser());


BufferedWriter htmlWriter = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("output.xml"), "UTF-8"));
        htmlWriter.write(doc.toString());
dorien
  • 5,265
  • 10
  • 57
  • 116

2 Answers2

20

With some help from Aleksandr M I solved it in the following way:

doc.outputSettings().indentAmount(0).prettyPrint(false);

A little less nice, but this also seemed to do the trick:

htmlWriter.write(doc.toString().replaceAll(">\\s+",">").replaceAll("\\s+<","<"));
Community
  • 1
  • 1
dorien
  • 5,265
  • 10
  • 57
  • 116
  • Thanks! `outputSettings()` is great. `replaceAll()` is problematic in that it can join e.g. this: `A doozy dog` into this textual content: `Adoozydog`, right – KajMagnus Jul 13 '20 at 09:59
  • In Javadocs, there is a line for this method `indentAmount(int indentAmount)` "Set the indent amount for pretty printing". I believe you wouldn't need to set `indent` to `0` if you're setting `prettyPrint `to `false` – Farid Sep 21 '21 at 12:33
1

Try this:

doc = Jsoup.parse(is, "UTF-8", "", Parser.xmlParser());
doc.outputSettings().escapeMode(Entities.EscapeMode.xhtml);
..
..

Hope this helps

web-nomad
  • 6,003
  • 3
  • 34
  • 49