1

I have written a scheduler to run every one minute and use IMAP (SSL) API to get all unread messages and then mark only 200 messages from it as read and then read content of each message (200) one by one. Issue I am facing is when mail box is having around 400k messages (100GB) the first command to get unread messages itself taking more than 10 mins. I am not sure if this is how imap behaves or is it something slow at mailbox or network level. Eventually my target is to read around 500k emails in 24 hours from mail box where one message will be of around 250kb and then store each HTML message in oracle DB as blob. Currently I am no where near to achieve this target. I am attaching my code below. It is processing only 50 messages in a minute. I would really appreciate if some one can guide me to fix any performance issues in my code. Also if some has experience of extracting HTML email from mail box and persisting in database using any way, It would be really helpful if you can share your knowledge. Thanks!

public void readMails() {



    int arraySize = fetchSize / threadCount;

    try {
            FlagTerm ft = new FlagTerm(new Flags(Flags.Flag.SEEN), false);
            msgs = inbox.search(ft);

            if (msgs.length > fetchSize) {
                batchMsgs = Arrays.copyOfRange(msgs, 0, fetchSize - 1);
            } else {
                batchMsgs = msgs;
            }

            inbox.setFlags(batchMsgs, new Flags(Flags.Flag.SEEN), true);


        if (batchMsgs.length != 0) {
            archiveTaskExecutor.initialize();
            List<Message> tempMsgs = new ArrayList<Message>();
            // Message[] tempMsgs = new Message[arraySize];
            // int i = 0;
            int j = batchMsgs.length;
            for (Message m : batchMsgs) {
                tempMsgs.add(m);
                // i++;
                if (tempMsgs.size() >= arraySize || j <= 1) {
                    archiveTaskExecutor
                            .execute(new ExtractAndPersist(tempMsgs
                                    .toArray(new Message[tempMsgs.size()])));
                    tempMsgs = new ArrayList<Message>();
                    // tempMsgs = new Message[arraySize];
                    // i = 0;
                }
                j--;
            }
            archiveTaskExecutor.shutdown();
            try {
                archiveTaskExecutor.getThreadPoolExecutor()
                        .awaitTermination(15, TimeUnit.MINUTES);
            } catch (InterruptedException e) {

                archiveTaskExecutor.getThreadPoolExecutor().shutdownNow();
            }

        }
    } catch (Exception e) {
        /** revert all messages to UNREAD here **/

    }
}

private class ExtractAndPersist implements Runnable {

    final Logger log = Logger.getLogger(ExtractAndPersist.class);

    private Message[] messages;

    public ExtractAndPersist(Message[] m) {
        this.messages = m;
    }

    @Override
    public void run() {

        try {


            for (Message message : messages) {
                if (message != null) {


                    String mailContent = processMessageBody(message);


                    status = updateMailContent(mailId, mailContent);

                    }
                }
            }

         catch (Exception e) {
            /** set messages as UNREAD **/

        }
        }
    }


}
  • What happens if you try to get the unread messages using an IMAP client like Thunderbird? I suspect you will find the limitation is the IMAP server's ability to search through 400K messages. – Jim Garrison Apr 01 '16 at 05:15
  • If you turn on [JavaMail session debugging](http://www.oracle.com/technetwork/java/javamail/faq/index.html#debug) you should see that JavaMail is sending a single SEARCH command to the server, where all the time is being spent searching for matching messages. There's not much you can do about that, but you can make your subsequent processing of the messages faster by using the [Folder.fetch](https://javamail.java.net/nonav/docs/api/javax/mail/Folder.html#fetch-javax.mail.Message:A-javax.mail.FetchProfile-) method. – Bill Shannon Apr 01 '16 at 06:29
  • thanks Jim and Bill for comments. I actually removed unseen search and instead using Fetch giving message range as suggested on this link http://stackoverflow.com/questions/8322836/javamail-imap-over-ssl-quite-slow-bulk-fetching-multiple-messages . This improved the performance a lot, I am able to process 1000 messages in 1 min now. Still far behind the target. – user2979919 Apr 05 '16 at 04:12
  • this helped me finally: http://stackoverflow.com/questions/8322836/javamail-imap-over-ssl-quite-slow-bulk-fetching-multiple-messages – user2979919 Apr 10 '16 at 01:24

1 Answers1

0

At a guess, the problem you're facing is that your IMAP server stores flags in the message, so that search means 100GB of disk I/O. Storing the flags there is stupid, but at least one IMAP server does it.

If I'm right, then you can speed it up quite a bit by using a range search. The search you do now is unseen. The one you should be doing is uid 12345:* unseen, where 12345 is one higher than the highest UID you've processed before. That frees the IMAP server from having to look at the first part of the mailbox. In Javamail I think the code will look like new AndTerm(new MessageNumberTerm(...), new FlagTerm(...)).

The way to high performance is to use all of the search result, though. Use it at once or cache it, but don't throw it away. Throwing away the result of a remote IMAP operation doesn't lead to high performance.

arnt
  • 8,949
  • 5
  • 24
  • 32
  • I will try the search provided by you. For now I tried to move to get messages by range rather than looking for unseen. Also I found that getContent(message) api provided by imap is slow if lot of messages have to be processed. it was taking around 2500 ms to download content of 1 email. I improved performance by referring this link http://stackoverflow.com/questions/8322836/javamail-imap-over-ssl-quite-slow-bulk-fetching-multiple-messages and doing bulk fetch but still far behind my target. – user2979919 Apr 05 '16 at 04:17