How do I process a DXL filedata element when the file encoding is "none"?

Question

I'm trying to extract attachments from Domino documents which were exported to DXL (Domino XML schema). For elements with encoding="base64" I can handle the filedata content with ease. However, most of the files have encoding="none" — which logically should mean direct embedding — but the container does not have a readable text, rather contains 76-character lines much similar to base64 encoding. They are not valid base64 or uuencoded info, nor anything I can recognize. Does anyone know what sort of arcane encoding is this one IBM calls "none"? A typical segment looks like this:

<file hosttype='msdos' compression='none' flags='sign storedindoc' encoding='none' 
name='myfilename.doc' size='50688' storagesize='32519' desiredcompression='huffman'>
<created><datetime dst='true'>20061110T193351,87-02</datetime></created>
<modified><datetime dst='true'>20061110T193351,73-02</datetime></modified><filedata>
0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/CQAGAAAAAAAAAAAAAAABAAAAXgAAAAAAAAAA
EAAAYAAAAAEAAAD+////AAAAAF0AAAD/////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////

(it goes on for hundreds of lines... up to)

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAA
</filedata></file>

It looks like some MIME encodign but it is not base64. The number of bits do not add up end the decoder fails. (Yes, I removed the NLs from the parser feed.)

How to decode something which is suposedly not encoded? (According to the IBM magi.)

[post-script] I realized that the document does not conform to the DXL DTD, i.e. it is parseable but does not validate. Also, although encoding="none" the filedata content is indeed base64, although no necessarily padded with '='s at the end. Also, the XML SAX parser was passing me chunks of the text content instead of entire lines. Since base64 needs multiples of 4 characters to operate on (generating 3 bytes), it messed up the decoding. If I ignore the DTD and force a carefully buffered base64 decoding, even when @encoding != "base64" (by the DTD), then all goes well. Looks like IBM does not care following its own DTDs.

Take a look at the answers to this question: https://stackoverflow.com/questions/12003916/lotus-notes-dxl-notesbitmap-to-gif. The question itself contains a link to a page that shows a file object with encoding='none'. The answers indicate that it is, in fact, base64 data, but there's more to it than that. — Richard Schwartz, Jun 16 '20 at 03:27
I had aleady checked that, which is related to converting the proprietary Notes bitmap image to GIF. The problem here is that the block is not a link (these are short headers) neither base64 encoded blocks. — Jaccoud, Jun 17 '20 at 13:21

How do I process a DXL filedata element when the file encoding is "none"?

0 Answers0

Linked