1

This code is running on Blackberry JDE v4.2.1 It's in a method that makes web API calls that return XML. Sometimes, the XML returned is not well formed and I need to strip out any invalid characters prior to parse.

Currently, I get: org.xml.sax.SAXParseException: Invalid character '' encountered.

I would like to see ideas of a fast way to attach an invalid character stripper on the input stream so that the stream just flows through the validator/stripper and into the parse call. i.e. I'm trying to avoid saving the content of the stream.

Existing code:

handler is an override of DefaultHandler
url is a String containing the API URL

hconn = (HttpConnection) Connector.open(url,Connector.READ_WRITE,true);

...

try{
   XMLParser parser = new XMLParser();
   InputStream input = hconn.openInputStream();
   parser.parse(input, handler);
   input.close();
} catch (SAXException e) {
   Logger.getInstance().error("getViaHTTP() - SAXException - "+e.toString());
}
Lucifer
  • 29,392
  • 25
  • 90
  • 143
JR Lawhorne
  • 3,192
  • 4
  • 31
  • 41

2 Answers2

2

It's difficult to attach a stripper on the InputStream because streams are byte-oriented. It might make more sense to do it on a Reader. You could make something like a StripReader that wraps a another reader and deals with errors. Below is a quick, untested, proof of concept for this:

public class StripReader extends Reader
{
    private Reader in;
    public StripReader(Reader in)
    {
    this.in = in;
    }

    public boolean markSupported()
    {
    return false;
    }

    public void mark(int readLimit)
    {
    throw new UnsupportedOperationException("Mark not supported");
    }

    public void reset()
    {
    throw new UnsupportedOperationException("Reset not supported");
    }

    public int read() throws IOException
    {
    int next;
    do
    {
        next = in.read();
    } while(!(next == -1 || Character.isValidCodePoint(next)));

    return next; 
    }

    public void close() throws IOException
    {
    in.close();
    }

    public int read(char[] cbuf, int off, int len) throws IOException
    {
    int i, next = 0;
    for(i = 0; i < len; i++)
    {
        next = read();
        if(next == -1)
        break;
        cbuf[off + i] = (char)next;
    }
    if(i == 0 && next == -1)
        return -1;
    else
        return i;
    }

    public int read(char[] cbuf) throws IOException
    {
    return read(cbuf, 0, cbuf.length);
    }
}

You would then construct an InputSource from then Reader then do the parse using the InputSource.

Matthew Flaschen
  • 278,309
  • 50
  • 514
  • 539
  • Since Blackberry apparently doesn't have FilterReader either, I modified the above not to use it. – Matthew Flaschen May 10 '09 at 03:51
  • RIM also doesn't include Character.isValidCodePoint() I had to roll my own. But, this method does seem to work - on the simulator at least. Hopefully, it will also hold up and not be too slow on a real device. Thanks! – JR Lawhorne May 10 '09 at 05:53
  • You're welcome. Just be sure to test well. It's unavoidably going to slow things down since every character must be (re-)checked. However, I don't think I'm doing any unnecessary copying. P.S. I'm curious as to how you implemented isValidCodePoint. – Matthew Flaschen May 10 '09 at 06:21
  • 1
    It's not going to show up well in this comments block but here is the method I use for validating an XML character: private boolean isValidXMLChar(int ch) { if ((ch == 0x9) || (ch == 0xA) || (ch == 0xD) || ((ch >= 0x20) && (ch <= 0xD7FF)) || ((ch >= 0xE000) && (ch <= 0xFFFD)) || ((ch >= 0x10000) && (ch <= 0x10FFFF))) return true; else return false; } – JR Lawhorne May 18 '09 at 16:05
0

Use a FilterInputStream. Override FilterInputStream#read to filter the offending bytes.

alphazero
  • 27,094
  • 3
  • 30
  • 26
  • Problem is that requires duplicating the character-decoding logic in the stream. – Matthew Flaschen May 10 '09 at 03:30
  • 1
    There may not be a way to avoid that without customizing XMLParser? – JR Lawhorne May 10 '09 at 03:33
  • RIM doesn't have FilterInputStream http://www.blackberry.com/developers/docs/4.2.1api/index.html – JR Lawhorne May 10 '09 at 03:37
  • Why not just use a customized XMLParser only when there is a SAXException? It would seem that if you get a bad xml file then it would be best to reject the entire file as the damaged part may lead to bad data being extracted. – James Black May 10 '09 at 03:37