5

I've downloaded mime4j 0.8.0 snapshot from subversion and built it with maven. The relevant jars I generated can be found here.

Now I try to parse a toy mbox file from mime4j test.

I use this sample code. Briefly:

final File mbox = new File("c:\\mbox.rlug");
int count = 0;
for (CharBufferWrapper message : MboxIterator.fromFile(mbox).charset(ENCODER.charset()).build()) {
    System.out.println(messageSummary(message.asInputStream(ENCODER.charset())));
    count++;
}
System.out.println("Found " + count + " messages");

+

private static String messageSummary(InputStream messageBytes) throws IOException, MimeException {
    MessageBuilder builder = new DefaultMessageBuilder();
    Message message = builder.parseMessage(messageBytes);
    return String.format("\nMessage %s \n" +
            "Sent by:\t%s\n" +
            "To:\t%s\n",
            message.getSubject(),
            message.getSender(),
            message.getTo());
}

The output is:

Message null Sent by: null To: null

Message null Sent by: null To: null

Message null Sent by: null To: null

Message null Sent by: null To: null

Message null Sent by: null To: null

Found 5 messages

There are indeed 5 messages, but why are all fields null?

zvisofer
  • 1,346
  • 18
  • 41
  • Could you just print the raw message in the loop, in order to see if it's correctly constructed ? `System.out.println(message);` – ToYonos Dec 02 '14 at 10:41

3 Answers3

3

Based on @zvisofer answer, I found the guilty piece of code in BufferedLineReaderInputStream:

@Override
public int readLine(final ByteArrayBuffer dst)
        throws MaxLineLimitException, IOException {
    if (dst == null) {
        throw new IllegalArgumentException("Buffer may not be null");
    }
    if (!readAllowed()) return -1;

    int total = 0;
    boolean found = false;
    int bytesRead = 0;
    while (!found) {
        if (!hasBufferedData()) {
            bytesRead = fillBuffer();
            if (bytesRead == -1) {
                break;
            }
        }
        int i = indexOf((byte)'\n');
        int chunk;
        if (i != -1) {
            found = true;
            chunk = i + 1 - pos();
        } else {
            chunk = length();
        }
        if (chunk > 0) {
            dst.append(buf(), pos(), chunk);
            skip(chunk);
            total += chunk;
        }
        if (this.maxLineLen > 0 && dst.length() >= this.maxLineLen) {
            throw new MaxLineLimitException("Maximum line length limit exceeded");
        }
    }
    if (total == 0 && bytesRead == -1) {
        return -1;
    } else {
        return total;
    }
}

The best thing to do would be to report the bug but here is a fix, a little dirty but it's working fine

Create the class org.apache.james.mime4j.io.BufferedLineReaderInputStream in your project

Replace the method public int readLine(final ByteArrayBuffer dst) by this one :

@Override
public int readLine(final ByteArrayBuffer dst)
        throws MaxLineLimitException, IOException {
    if (dst == null) {
        throw new IllegalArgumentException("Buffer may not be null");
    }
    if (!readAllowed()) return -1;

    int total = 0;
    boolean found = false;
    int bytesRead = 0;
    while (!found) {
        if (!hasBufferedData()) {
            bytesRead = fillBuffer();
            if (bytesRead == -1) {
                break;
            }
        }

        int chunk;
        int i = indexOf((byte)'\r');
        if (i != -1) {
            found = true;
            chunk = i + 2 - pos();
        } else {
            i = indexOf((byte)'\n');
            if (i != -1) {
                found = true;
                chunk = i + 1 - pos();
            } else {
                chunk = length();
            }
        }
        if (chunk > 0) {
            dst.append(buf(), pos(), chunk);
            skip(chunk);
            total += chunk;
        }
        if (this.maxLineLen > 0 && dst.length() >= this.maxLineLen) {
            throw new MaxLineLimitException("Maximum line length limit exceeded");
        }
    }
    if (total == 0 && bytesRead == -1) {
        return -1;
    } else {
        return total;
    }
}

Enjoy both unix and dos files :)

ToYonos
  • 16,469
  • 2
  • 54
  • 70
  • This code causes 5 build tests to fail (one of them is error). I guess that it will fail if you have '\r' not followed by '\n' – zvisofer Dec 11 '14 at 09:41
  • Yes, my fix can be improve I guess, handling the case when \r is alone – ToYonos Dec 11 '14 at 09:58
  • using: `byte[] microsoftSucks = {(byte)'\r', (byte)'\n'};` `int i = indexOf(microsoftSucks);` Fixes 3 tests but two are still failing – zvisofer Dec 11 '14 at 10:03
2

I found the problem.

DefaultMessageBuilder fails to parse mbox files that have windows line separator \r\n. When replacing them with UNIX line separator \n the parsing works.

This is a critical issue, since the mbox files downloaded from Gmail use \r\n.

zvisofer
  • 1,346
  • 18
  • 41
1

I downloaded your jar files, the sample code that you pointed to, and the sample mbox file that you pointed to, compiled the sample (with no changes) and ran it against the sample mbox file.

It worked as expected (fields contained the expected data, not nulls). This was on a Mac with Java 1.6_0_65, and also with 1.8.0_11

Output was as follows:

$ java -cp .:apache-mime4j-core-0.8.0-SNAPSHOT.jar:apache-mime4j-dom-0.8.0-SNAPSHOT.jar:apache-mime4j-mbox-iterator-0.8.0-SNAPSHOT.jar IterateOverMbox mbox.rlug.txt

Message Din windows ma pot, din LINUX NU ma pot conecta (la ZAPP) Sent by: rlug-bounce@lug.ro To: [rlug@lug.ro]

Message Re: RH 8.0 boot floppy Sent by: rlug-bounce@lug.ro To: [rlug@lug.ro]

Message Qmail mysql virtualusers +ssl + smtp auth +pop3 Sent by: rlug-bounce@lug.ro To: [rlug@lug.ro]

Message Re: Din windows ma pot, din LINUX NU ma pot conecta (la ZAPP) Sent by: rlug-bounce@lug.ro To: [rlug@lug.ro]

Message LSTP problem - solved Sent by: rlug-bounce@lug.ro To: [rlug@lug.ro]

Found 5 messages Done in: 108 milis

GreyBeardedGeek
  • 29,460
  • 2
  • 47
  • 67