I'm trying to extract attachments from Domino documents which were exported to DXL (Domino XML schema). For elements with encoding="base64" I can handle the filedata content with ease. However, most of the files have encoding="none" — which logically should mean direct embedding — but the container does not have a readable text, rather contains 76-character lines much similar to base64 encoding. They are not valid base64 or uuencoded info, nor anything I can recognize. Does anyone know what sort of arcane encoding is this one IBM calls "none"? A typical segment looks like this:
<file hosttype='msdos' compression='none' flags='sign storedindoc' encoding='none'
name='myfilename.doc' size='50688' storagesize='32519' desiredcompression='huffman'>
<created><datetime dst='true'>20061110T193351,87-02</datetime></created>
<modified><datetime dst='true'>20061110T193351,73-02</datetime></modified><filedata>
0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/CQAGAAAAAAAAAAAAAAABAAAAXgAAAAAAAAAA
EAAAYAAAAAEAAAD+////AAAAAF0AAAD/////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////
(it goes on for hundreds of lines... up to)
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAA
</filedata></file>
It looks like some MIME encodign but it is not base64. The number of bits do not add up end the decoder fails. (Yes, I removed the NLs from the parser feed.)
How to decode something which is suposedly not encoded? (According to the IBM magi.)
[post-script] I realized that the document does not conform to the DXL DTD, i.e. it is parseable but does not validate. Also, although encoding="none" the filedata content is indeed base64, although no necessarily padded with '='s at the end. Also, the XML SAX parser was passing me chunks of the text content instead of entire lines. Since base64 needs multiples of 4 characters to operate on (generating 3 bytes), it messed up the decoding. If I ignore the DTD and force a carefully buffered base64 decoding, even when @encoding != "base64" (by the DTD), then all goes well. Looks like IBM does not care following its own DTDs.