2

I have to read a file (existing format not under my control) that contains an XML document and encoded data. This file unfortunately includes MQ-related data around it including hex zeros (end of files).

So, using Java, how can I read this file, stripping or ignoring the "garbage" I don't need to get at the XML and encoded data. I believe an acceptable solution is to just leave out the hex zeros (are there other values that will stop my reading?) since I don't need the MQ information (RFH header) anyway and the counts are meaningless for my purposes.

I have searched a lot and only find really heinous complicated "solutions". There must be a better way...

Mark
  • 1,988
  • 2
  • 24
  • 42

2 Answers2

1

What worked was to pull out the XML documents - Groovy code:

    public static final String REQUEST_XML          = "<Request>";
    public static final String REQUEST_END_XML      = "</Request>";
    /**
 * @param xmlMessage
 * @return 1-N EncodedRequests for those I contain
 */
private void extractRequests( String xmlMessage ) {
    int start = xmlMessage.indexOf(REQUEST_XML);
    int end = xmlMessage.indexOf(REQUEST_END_XML);
    end += REQUEST_END_XML.length();
    while( start >= 0 ) {   //each <Request>
        requests.add(new EncodedRequest(xmlMessage.substring(start,end)));
        start = xmlMessage.indexOf(REQUEST_XML, end);
        end = xmlMessage.indexOf(REQUEST_END_XML, start);
        end += REQUEST_END_XML.length();
    }
}

and then decode the base64 portion:

    public String getDecodedContents() {
    if( decodedContents == null ) {
        byte[] decoded = Base64.decodeBase64(getEncodedContents().getBytes());
        String newString = new String(decoded);
        decodedContents = newString;
        decodedContents = decodedContents.replace('\r','\t');
    }
    return decodedContents;
}
Mark
  • 1,988
  • 2
  • 24
  • 42
0

I've hit this issue before (well ... something similar). Have a look a my FilterInputStream for a file filter that you should be able to modify to your needs.

Essentially it implements a push-back buffer that chucks away anything you don't want.

Community
  • 1
  • 1
OldCurmudgeon
  • 64,482
  • 16
  • 119
  • 213