18

What's the easiest way to make a canonical form of a XML file in Java? Do you have some done code for that? I've found several links on the net, like this, this, and this, but I can't make it to work :/

Thanks,

Ivan

EDIT: I used the canonicalizer that was proposed down there, but I get strange results. To be more precize, this method doesn't delete white spaces between elements... This is what I get:

<Metric xmlns="http://www.ibm.com/wsla" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" name="total_memory_consumption_metric" type="double" unit="Mbit" xsi:schemaLocation="http://www.ibm.com/wsla WSLA.xsd">                        <Source>ServiceProvider</Source>                        <MeasurementDirective resultType="double" xsi:type="StatusRequest">                              <RequestURI> ***unused*** </RequestURI>                        </MeasurementDirective>                  </Metric>
Ivan
  • 495
  • 3
  • 9
  • 20

2 Answers2

27

The Canonicalizer class at Apache XML Security project.

Initialize the library.

org.apache.xml.security.Init.init(); 

Convert your XML.

Canonicalizer canon = Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_OMIT_COMMENTS);
byte canonXmlBytes[] = canon.canonicalize(yourXmlBytes);
String canonXmlString = new String(canonXmlBytes);
Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
eolith
  • 1,366
  • 10
  • 14
  • 22
    To anyone who hasn't worked with the Apache XML Security library before, you must initialize the library using the static method "org.apache.xml.security.Init.init();" before invoking any code from that library, or you'll get an error. – Stew Nov 15 '12 at 22:16
  • 1
    The point @Stew makes is quite important. It should really be part of the answer itself. – aroth Apr 11 '16 at 04:01
  • I find this solution doesn't work on emoji character which becomes "??" after canonicalized. – Solomon Tam Aug 22 '18 at 09:43
  • 1
    @Stew I added your library initialization to the original answer. Better late than never :) – eolith Feb 12 '20 at 11:57
2

Another option is nu.xom.canonical.Canonicalizer if you're using XOM, or if you don't otherwise have a need for Apache XML Security.

David Moles
  • 48,006
  • 27
  • 136
  • 235
  • This is the better option. In my case I called Serializer first and then directed the result to Canonicalizer. The result was a more readable canonical XML. – Georgios F. Apr 20 '20 at 16:37