26

I have an Object that is being marshalled to XML using JAXB. One element contains a String that includes quotes ("). The resulting XML has " where the " existed.

Even though this is normally preferred, I need my output to match a legacy system. How do I force JAXB to NOT convert the HTML entities?

--

Thank you for the replies. However, I never see the handler escape() called. Can you take a look and see what I'm doing wrong? Thanks!

package org.dc.model;

import java.io.IOException;
import java.io.Writer;

import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBException;
import javax.xml.bind.Marshaller;

import org.dc.generated.Shiporder;

import com.sun.xml.internal.bind.marshaller.CharacterEscapeHandler;

public class PleaseWork {
    public void prettyPlease() throws JAXBException {
        Shiporder shipOrder = new Shiporder();
        shipOrder.setOrderid("Order's ID");
        shipOrder.setOrderperson("The woman said, \"How ya doin & stuff?\"");

        JAXBContext context = JAXBContext.newInstance("org.dc.generated");
        Marshaller marshaller = context.createMarshaller();
        marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
        marshaller.setProperty(CharacterEscapeHandler.class.getName(),
                new CharacterEscapeHandler() {
                    @Override
                    public void escape(char[] ch, int start, int length,
                            boolean isAttVal, Writer out) throws IOException {
                        out.write("Called escape for characters = " + ch.toString());
                    }
                });
        marshaller.marshal(shipOrder, System.out);
    }

    public static void main(String[] args) throws Exception {
        new PleaseWork().prettyPlease();
    }
}

--

The output is this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<shiporder orderid="Order's ID">
    <orderperson>The woman said, &quot;How ya doin &amp; stuff?&quot;</orderperson>
</shiporder>

and as you can see, the callback is never displayed. (Once I get the callback being called, I'll worry about having it actually do what I want.)

--

animuson
  • 53,861
  • 28
  • 137
  • 147
Elliot
  • 1,286
  • 2
  • 12
  • 22
  • Deleted my prior answer, since it was utterly wrong... however, it's still worth pointing out that `"` is not an HTML entity, it's an XML escape. – skaffman Oct 01 '09 at 22:54
  • It's actually both an XML and HTML entity. http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references – Elliot Oct 02 '09 at 11:26
  • @Elliot: but in this context it is an XML escape. This is not HTML. – Stephen C Oct 03 '09 at 01:11
  • A teammate of mine figured this out without requiring a Vendor Specific implementation. Shown above. – Elliot Oct 05 '09 at 18:48
  • I've run into the same problem. Which version of JAXB are you using? Currently I use JAXB 2.2.4 and in that release quotes are escaped only in attributes. However I have some XML files, produced by v2.1.13, which have the same "problem". However I have checked the source of `MinimumEscapeHandler` from 2.1.13, and it seems to be OK (I even checked 2.0.1). Perhaps, this escape handler was not activated (thanks to @GrzegorzOledzki for bug report). – dma_k Nov 02 '11 at 10:33
  • After debugging it turned out to be a ridiculous problem: [Escape policy for quote (") is different when the serialization is performed to OutputStream or Writer](http://java.net/jira/browse/JAXB-868). – dma_k Nov 03 '11 at 09:49
  • Check https://stackoverflow.com/questions/4435934/handling-xml-escape-characters-e-g-quotes-using-jaxb-marshaller/4457559#4457559 – javdev Sep 13 '19 at 06:36

14 Answers14

13

Solution my teammate found:

PrintWriter printWriter = new PrintWriter(new FileWriter(xmlFile));
DataWriter dataWriter = new DataWriter(printWriter, "UTF-8", DumbEscapeHandler.theInstance);
marshaller.marshal(request, dataWriter);

Instead of passing the xmlFile to marshal(), pass the DataWriter which knows both the encoding and an appropriate escape handler, if any.

Note: Since DataWriter and DumbEscapeHandler are both within the com.sun.xml.internal.bind.marshaller package, you must bootstrap javac.

Elliot
  • 1,286
  • 2
  • 12
  • 22
  • 1
    Did you try @laz's answer? That looks like the way to do it "properly". – skaffman Oct 12 '09 at 07:16
  • The above method works - instead of using properties to set the escape handler in jdk 1.6.0.22 – Anna May 26 '12 at 12:10
  • This one works for Java 7 instead of setting handler property to marshaller, the problem is how to format the output of the datawriter now – Yassir Khaldi Aug 05 '19 at 11:59
11

I have just made my custom handler as a class like this:

import java.io.IOException;
import java.io.StringWriter;
import java.io.Writer;

import com.sun.xml.bind.marshaller.CharacterEscapeHandler;

public class XmlCharacterHandler implements CharacterEscapeHandler {

    public void escape(char[] buf, int start, int len, boolean isAttValue,
            Writer out) throws IOException {
        StringWriter buffer = new StringWriter();

        for (int i = start; i < start + len; i++) {
            buffer.write(buf[i]);
        }

        String st = buffer.toString();

        if (!st.contains("CDATA")) {
            st = buffer.toString().replace("&", "&amp;").replace("<", "&lt;")
                .replace(">", "&gt;").replace("'", "&apos;")
                .replace("\"", "&quot;");

        }
        out.write(st);
        System.out.println(st);
    }

}

in the marshaller method simply call:

marshaller.setProperty(CharacterEscapeHandler.class.getName(),
                new XmlCharacterHandler());

it works fine.

sanastasiadis
  • 1,182
  • 1
  • 15
  • 23
Laura Liparulo
  • 2,849
  • 26
  • 27
4

I've been playing with your example a bit and debugging the JAXB code. And it seems it's something specific about UTF-8 encoding used. The escapeHandler property of MarshallerImpl seems to be set properly. However it's being used not in every context. If I searched for calls of MarshallerImpl.createEscapeHandler() I found:

public XmlOutput createWriter( OutputStream os, String encoding ) throws JAXBException {
    // UTF8XmlOutput does buffering on its own, and
    // otherwise createWriter(Writer) inserts a buffering,
    // so no point in doing a buffering here.

    if(encoding.equals("UTF-8")) {
        Encoded[] table = context.getUTF8NameTable();
        final UTF8XmlOutput out;
        if(isFormattedOutput())
            out = new IndentingUTF8XmlOutput(os,indent,table);
        else {
            if(c14nSupport)
                out = new C14nXmlOutput(os,table,context.c14nSupport);
            else
                out = new UTF8XmlOutput(os,table);
        }
        if(header!=null)
            out.setHeader(header);
        return out;
    }

    try {
        return createWriter(
            new OutputStreamWriter(os,getJavaEncoding(encoding)),
            encoding );
    } catch( UnsupportedEncodingException e ) {
        throw new MarshalException(
            Messages.UNSUPPORTED_ENCODING.format(encoding),
            e );
    }
}

Note that in your setup the top section (...equals("UTF-8")...) is taken into consideration. However this one doesn't take the escapeHandler. However if you set the encoding to any other, the bottom part of this method is called (createWriter(OutputStream, String)) and this one uses escapeHandler, so EH plays its role. So, adding...

    marshaller.setProperty(Marshaller.JAXB_ENCODING, "ASCII");

makes your custom CharacterEscapeHandler be called. Not really sure, but I would guess this is kind of bug in JAXB.

Grzegorz Oledzki
  • 23,614
  • 16
  • 68
  • 106
  • Thanks for your response, Grzegorz. I agree with you, it appears to be a JAXB bug. And if there is a legitimate reason for it, it'd be nice to have it in the documentation. Thanks! – Elliot Oct 06 '09 at 12:46
  • I've filed a bug report in JAXB tracking tool: https://jaxb.dev.java.net/issues/show_bug.cgi?id=693 – Grzegorz Oledzki Oct 06 '09 at 14:38
4

I would say that easiest way to do is by overriding CharacterEscapeHandler :

marshaller.setProperty("com.sun.xml.bind.characterEscapeHandler", new CharacterEscapeHandler() {
    @Override
    public void escape(char[] ch, int start, int length, boolean isAttVal,
                       Writer out) throws IOException {
        out.write(ch, start, length);
    }
});
Maher Abuthraa
  • 17,493
  • 11
  • 81
  • 103
3

@Elliot you can use this in order to enable marshaller to enter characterEscape function. It is wierd but it works if you set "Unicode" instead of "UTF-8". Add this just before or after you set CharacterEscapeHandler property.

marshaller.setProperty(Marshaller.JAXB_ENCODING, "Unicode");

However don't be sure just only by checking your console within your IDE, because it should be shown depend on the workspace encoding. It is better to check it also from a file like that:

marshaller.marshal(shipOrder, new File("C:\\shipOrder.txt"));
javatar
  • 4,542
  • 14
  • 50
  • 67
2

i found same issue i fixed this using xmlWriter in xmlWriter file there is one method isEscapeText() and setEscapeTest that is by default true if you dont want transformation between < to &lt that time you need to setEscapeTest(false); during marshalling

JAXBContext jaxbContext = JAXBContext.newInstance(your class);
Marshaller marshaller = jaxbContext.createMarshaller();

marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);

// Create a filter that will remove the xmlns attribute
NamespaceFilter outFilter = new NamespaceFilter(null, false);

// Do some formatting, this is obviously optional and may effect
// performance
OutputFormat format = new OutputFormat();
format.setIndent(true);
format.setNewlines(true);

// Create a new org.dom4j.io.XMLWriter that will serve as the
// ContentHandler for our filter.
XMLWriter writer = new XMLWriter(new FileOutputStream(file), format);
writer.setEscapeText(false); // <----------------- this line
// Attach the writer to the filter
outFilter.setContentHandler(writer);
// marshalling
marshaller.marshal(piaDto, outFilter);
marshaller.marshal(piaDto, System.out);

this change writer.setEscapeText(false); fixed my issue hope this changes helpful to you

JuanDM
  • 1,250
  • 10
  • 24
1

Seems like it is possible with Sun's JAXB implementation, although I've not done it myself.

laz
  • 28,320
  • 5
  • 53
  • 50
1

This works for me after reading other posts:

javax.xml.bind.JAXBContext jc = javax.xml.bind.JAXBContext.newInstance(object);
marshaller = jc.createMarshaller();         marshaller.setProperty(javax.xml.bind.Marshaller.JAXB_FORMATTED_OUTPUT, true);
marshaller.setProperty(javax.xml.bind.Marshaller.JAXB_ENCODING, "UTF-8");                   marshaller.setProperty(CharacterEscapeHandler.class.getName(), new CustomCharacterEscapeHandler());


public static class CustomCharacterEscapeHandler implements CharacterEscapeHandler {
        /**
         * Escape characters inside the buffer and send the output to the Writer.
         * (prevent <b> to be converted &lt;b&gt; but still ok for a<5.)
         */
        public void escape(char[] buf, int start, int len, boolean isAttValue, Writer out) throws IOException {
            if (buf != null){
                StringBuilder sb = new StringBuilder();
                for (int i = start; i < start + len; i++) {
                    char ch = buf[i];

                    //by adding these, it prevent the problem happened when unmarshalling
                    if (ch == '&') {
                        sb.append("&amp;");
                        continue;
                    }

                    if (ch == '"' && isAttValue) {
                        sb.append("&quot;");
                        continue;
                    }

                    if (ch == '\'' && isAttValue) {
                        sb.append("&apos;");
                        continue;
                    }


                    // otherwise print normally
                    sb.append(ch);
                }

                //Make corrections of unintended changes
                String st = sb.toString();

                st = st.replace("&amp;quot;", "&quot;")
                       .replace("&amp;lt;", "&lt;")
                       .replace("&amp;gt;", "&gt;")
                       .replace("&amp;apos;", "&apos;")
                       .replace("&amp;amp;", "&amp;");

                out.write(st);
            }
        }
    }
hoaz
  • 9,883
  • 4
  • 42
  • 53
1

I checked the XML specification. http://www.w3.org/TR/REC-xml/#sec-references says "well-formed documents need not declare any of the following entities: amp, lt, gt, apos, quot. " so it appears that the XML parser used by the legacy system is not conformant.

(I know that it does not solve your problem, but it is at least nice to be able to say which component is broken).

Thorbjørn Ravn Andersen
  • 73,784
  • 33
  • 194
  • 347
0

For some reason I have no time to find out, it worked for me when setting

marshaller.setProperty(Marshaller.JAXB_ENCODING, "utf-8");

As opposed to using "UTF-8" or "Unicode"

I suggest you try them, and as @Javatar said, check them dumping to file using:

marshaller.marshal(shipOrder, new File("<test_file_path>"));

and opening it with a a decent text editor like notepad++

Community
  • 1
  • 1
mamuso
  • 3,117
  • 3
  • 30
  • 32
0

interesting but with strings you can try out

Marshaller marshaller = jaxbContext.createMarshaller();
StringWriter sw = new StringWriter();
marshaller.marshal(data, sw);
sw.toString();

at least for me this do not escape quotes

jurisz
  • 9
  • 1
0

I would advise against using CharacterEscapeHandler for the reasons mentioned above (it's an internal class). Instead you can use Woodstox and supply your own EscapingWriterFactory to a XMLStreamWriter. Something like:

XMLOutputFactory2 xmlOutputFactory = (XMLOutputFactory2)XMLOutputFactory.newFactory();
xmlOutputFactory.setProperty(XMLOutputFactory2.P_TEXT_ESCAPER, new EscapingWriterFactory() {

    @Override
    public Writer createEscapingWriterFor(Writer w, String enc) {
        return new EscapingWriter(w);
    }

    @Override
    public Writer createEscapingWriterFor(OutputStream out, String enc) throws UnsupportedEncodingException {
        return new EscapingWriter(new OutputStreamWriter(out, enc));
    }

});

marshaller.marshal(model, xmlOutputFactory.createXMLStreamWriter(out);

An example of how to write an EscapingWriter can be seen in CharacterEscapingTest.

samblake
  • 1,517
  • 3
  • 16
  • 33
0

After trying all the above solutions, finally came to the conclusion.

your marshaling logic through the custom escape handler.

final StringWriter sw = new StringWriter();
    final Class classType = fixml.getClass();
    final JAXBContext jaxbContext = JAXBContext.newInstance(classType);
    final Marshaller marshaller = jaxbContext.createMarshaller();
    final JAXBElement<T> fixmsg = new JAXBElement<T>(new QName(namespaceURI, localPart), classType, fixml);
    marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
    marshaller.setProperty(CharacterEscapeHandler.class.getName(), new JaxbCharacterEscapeHandler());
    marshaller.marshal(fixmsg, sw);
    return sw.toString();

And the custom escape handler is as follow:

import java.io.IOException;
import java.io.Writer;

public class JaxbCharacterEscapeHandler implements CharacterEscapeHandler {

    public void escape(char[] buf, int start, int len, boolean isAttValue,
                    Writer out) throws IOException {

            for (int i = start; i < start + len; i++) {
                    char ch = buf[i];
                    out.write(ch);
            }
    }
}
Sufiyan Ansari
  • 1,780
  • 20
  • 22
  • 1
    this still doesn't work it converts my --> &13; which I was expecting to be left untouched. – Artanis Zeratul Dec 02 '18 at 07:14
  • @ArtanisZeratul If you are receiving an input < > this will be output as is as defined in escape function. If you will see a function I'm doing nothing but replacing each character with the same character eg < \n > ---> < \n > I'm not doing any special handling. You can handle your case in the same function by just replacing < > --> &13 at out.write(ch). Note: ch is a one charecter. – Sufiyan Ansari Dec 02 '18 at 07:41
0

The simplest way, when using sun's Marshaller implementation is to provide your own implementation of the CharacterEscapeEncoder which does not escape anything.

    Marshaller m = jcb.createMarshaller();
m.setProperty(
    "com.sun.xml.bind.marshaller.CharacterEscapeHandler",
    new NullCharacterEscapeHandler());

With

public class NullCharacterEscapeHandler implements CharacterEscapeHandler {

    public NullCharacterEscapeHandler() {
        super();
    }


    public void escape(char[] ch, int start, int length, boolean isAttVal, Writer writer) throws IOException {
        writer.write( ch, start, length );
    }
}
fred
  • 9
  • 1
fred
  • 75
  • 1
  • 1