5

imagine following situation: we receive a xml file from some external tool. Lately within this xml, there can be some escaped charakters in nodenames or within their richcontent tag, like in the following example (simplyfied):

<map>
<node TEXT="Project">
<node TEXT="&#xe4;&#xe4;">
<richcontent TYPE="NOTE"><html>
  <head>

  </head>
  <body>
    <p>
      I am a Note for Node &#228;&#228;!
    </p>
  </body>
</html>
</richcontent>
</node>
</node>
</map>

After unmarshalling the file with JAXB those escaped charakters get unescaped. Unfortunatly I need them to stay the way they are, meaning escaped. Is there any way to avoid unescaping those characters while unmarshalling?

While researching I found a lot of questions concerning marshalling xml-files where the opposite problem occurs, but those didnt help me either:

Is it even possible to achieve this aim with JAXB, or do we even have to consider changing to a different xml reader API?

Thank you in advance, ymene

Community
  • 1
  • 1
crusam
  • 6,140
  • 6
  • 40
  • 68
  • To any xml parser it does not matter whether the source document contains `ä`, 'ä` or `ä`, why does it matter in your case? – Jörn Horstmann Feb 15 '12 at 15:53
  • The problem is: after importing those XML Data, we merge it with our programm data. There we change some details and then want to write those details back in xml for the external tool. Since we didnt wonna build up another object graph just to marshal the data back in xml, we deceided to use StAX since this was just simpler at this time. Since yet, we never had any escaped chars until now and unfortunatly the external tool expects the charakters still to be escaped to work. – crusam Feb 15 '12 at 16:13

1 Answers1

3

You need only to replace &# by &amp;# hence call

unmarshaller.unmarshal(new AmpersandingStream(new FileInputStream(...)));

and

import java.io.IOException;
import java.io.InputStream;

/**
* Replaces numerical entities with their notation as text.
*/
public class AmpersandingStream extends InputStream {

    private InputStream in;
    private boolean justReadAmpersand;
    private String lookAhead = "";

    public AmpersandingStream(InputStream in) {
        this.in = in;
    }

    @Override
    public int read() throws IOException {
        if (!lookAhead.isEmpty()) {
            int c = lookAhead.codePointAt(0);
            lookAhead = lookAhead.substring(Character.charCount(c));
            return c;
        }
        int c = in.read();
        if (c == (int)'#' && justReadAmpersand) {
            c = (int)'a';
            lookAhead = "mp;#";
        }
        justReadAmpersand = c == (int)'&';
        return c;
    }

    @Override
    public int available() throws IOException {
        return in.available();
    }

    @Override
    public void close() throws IOException {
        in.close();
    }

    @Override
    public synchronized void mark(int readlimit) {
        in.mark(readlimit);
    }

    @Override
    public boolean markSupported() {
        return in.markSupported();
    }

    @Override
    public int read(byte[] b) throws IOException {
        return in.read(b);
    }

    @Override
    public int read(byte[] b, int off, int len) throws IOException {
        return in.read(b, off, len);
    }

    @Override
    public synchronized void reset() throws IOException {
        in.reset();
    }

    @Override
    public long skip(long n) throws IOException {
        return in.skip(n);
    }

}
Joop Eggen
  • 107,315
  • 7
  • 83
  • 138