67

i'm developing a client mail using javax.mail to read mail inside mail box:

Properties properties = System.getProperties();  
properties.setProperty("mail.store.protocol", "imap");  
try {  
    Session session = Session.getDefaultInstance(properties, null);
    Store store = session.getStore("pop3");//create store instance  
    store.connect("pop3.domain.it", "mail.it", "*****");  
    Folder inbox = store.getFolder("inbox");  
    FlagTerm ft = new FlagTerm(new Flags(Flags.Flag.SEEN), false);
    inbox.open(Folder.READ_ONLY);//set access type of Inbox  
    Message messages[] = inbox.search(ft);
    String mail,sub,bodyText="";
    Object body;
    for(Message message:messages) {
        mail = message.getFrom()[0].toString();
        sub = message.getSubject();
        body = message.getContent();
        //bodyText = body.....
    }
} catch (Exception e) {  
    System.out.println(e);    
}

I know that the method getContent() returns an object cause the content could be a String, a MimeMultiPart, a SharedByteArrayInputstream and other ( i think )... Is there a way to get always the text inside body of message? Thanks!!

Jayyrus
  • 12,961
  • 41
  • 132
  • 214
  • What kind of output are you getting??? can't you make use of `msg.getContentType()` for identifying type and process mail based on type?? – Raghav Jun 28 '12 at 08:14
  • i don't need to know what kind of type is the content, i need only to know text inside it – Jayyrus Jun 28 '12 at 08:16
  • Each mail with different MIME type needs to be handled in a different way in-order to get Text. so you need to switch using `getContentType` – Raghav Jun 28 '12 at 08:23
  • There's a really oddball mix of POP3 and IMAP stuff in here. – dkarp Aug 01 '12 at 05:18
  • See this as well http://stackoverflow.com/questions/5628395/javamail-parsing-email-content-cant-seem-to-get-it-to-work-message-getcont/26142591#26142591 – NoNaMe Oct 01 '14 at 14:03

10 Answers10

99

This answer extends yurin's answer. The issue he brought up was that the content of a MimeMultipart may itself be another MimeMultipart. The getTextFromMimeMultipart() method below recurses in such cases on the content until the message body has been fully parsed.

private String getTextFromMessage(Message message) throws MessagingException, IOException {
    if (message.isMimeType("text/plain")) {
        return message.getContent().toString();
    } 
    if (message.isMimeType("multipart/*")) {
        MimeMultipart mimeMultipart = (MimeMultipart) message.getContent();
        return getTextFromMimeMultipart(mimeMultipart);
    }
    return "";
}

private String getTextFromMimeMultipart(
        MimeMultipart mimeMultipart)  throws MessagingException, IOException{
    String result = "";
    for (int i = 0; i < mimeMultipart.getCount(); i++) {
        BodyPart bodyPart = mimeMultipart.getBodyPart(i);
        if (bodyPart.isMimeType("text/plain")) {
            return result + "\n" + bodyPart.getContent(); // without return, same text appears twice in my tests
        } 
        result += this.parseBodyPart(bodyPart);
    }
    return result;
}

private String parseBodyPart(BodyPart bodyPart) throws MessagingException, IOException { 
    if (bodyPart.isMimeType("text/html")) {
        return "\n" + org.jsoup.Jsoup
            .parse(bodyPart.getContent().toString())
            .text();
    } 
    if (bodyPart.getContent() instanceof MimeMultipart){
        return getTextFromMimeMultipart((MimeMultipart)bodyPart.getContent());
    }

    return "";
}
Mike Warren
  • 3,796
  • 5
  • 47
  • 99
Austin
  • 8,018
  • 2
  • 31
  • 37
  • 1
    JFYI: In the Orcale's JavaMail FAQ, theu have handled `multipart/alternative` differently: http://www.oracle.com/technetwork/java/javamail/faq/index.html#mainbody. Not sure why they are doing it, as I am not familiar with `multipart`. – Abhishek Gupta Mar 25 '17 at 13:28
  • 1
    @AbhishekGupta The difference is that for `multipart/alternative`, the user agent is only supposed to choose one part, not concatenate. The FAQ code does this whereas the above code does not. See my answer below for more detail. – hendalst Nov 01 '18 at 07:53
27

This answer extends Austin's answer to correct the orginal issue with treatment of multipart/alternative (// without break same text appears twice in my tests).

The text appears twice because for multipart/alternative, the user agent is expected to choose only one part.

From RFC2046:

The "multipart/alternative" type is syntactically identical to "multipart/mixed", but the semantics are different. In particular, each of the body parts is an "alternative" version of the same information.

Systems should recognize that the content of the various parts are interchangeable. Systems should choose the "best" type based on the local environment and references, in some cases even through user interaction. As with "multipart/mixed", the order of body parts is significant. In this case, the alternatives appear in an order of increasing faithfulness to the original content. In general, the best choice is the LAST part of a type supported by the recipient system's local environment.

Same example with treatment for alternatives:

private String getTextFromMessage(Message message) throws IOException, MessagingException {
    String result = "";
    if (message.isMimeType("text/plain")) {
        result = message.getContent().toString();
    } else if (message.isMimeType("multipart/*")) {
        MimeMultipart mimeMultipart = (MimeMultipart) message.getContent();
        result = getTextFromMimeMultipart(mimeMultipart);
    }
    return result;
}

private String getTextFromMimeMultipart(
        MimeMultipart mimeMultipart) throws IOException, MessagingException {

    int count = mimeMultipart.getCount();
    if (count == 0)
        throw new MessagingException("Multipart with no body parts not supported.");
    boolean multipartAlt = new ContentType(mimeMultipart.getContentType()).match("multipart/alternative");
    if (multipartAlt)
        // alternatives appear in an order of increasing 
        // faithfulness to the original content. Customize as req'd.
        return getTextFromBodyPart(mimeMultipart.getBodyPart(count - 1));
    String result = "";
    for (int i = 0; i < count; i++) {
        BodyPart bodyPart = mimeMultipart.getBodyPart(i);
        result += getTextFromBodyPart(bodyPart);
    }
    return result;
}

private String getTextFromBodyPart(
        BodyPart bodyPart) throws IOException, MessagingException {
    
    String result = "";
    if (bodyPart.isMimeType("text/plain")) {
        result = (String) bodyPart.getContent();
    } else if (bodyPart.isMimeType("text/html")) {
        String html = (String) bodyPart.getContent();
        result = org.jsoup.Jsoup.parse(html).text();
    } else if (bodyPart.getContent() instanceof MimeMultipart){
        result = getTextFromMimeMultipart((MimeMultipart)bodyPart.getContent());
    }
    return result;
}

Note that this is a very simple example. It misses many cases and should not be used in production in it's current format.

Community
  • 1
  • 1
hendalst
  • 2,957
  • 1
  • 24
  • 25
  • java.lang.ClassCastException: javax.mail.util.SharedByteArrayInputStream cannot be cast to javax.mail.internet.MimeMultipart I am getting this error – Jerry May 30 '17 at 10:29
  • 3
    This is a really great example - the best currently on the internet, thanks. – Zach Alberico Oct 18 '17 at 21:14
  • For gmail, this doesn't return mail body.. it is always returning null pointer @ `String html = (String) bodyPart.getContent();` What could be the issue? – Paresh May 18 '18 at 22:53
  • This example works exactly as I supposed. Mail .eml messages can have complicated hierarchy and looks like, that this class is able to include all cases. Additionaly I must say, that libray `javax.mail` is quick and reliable. Good choice. – hariprasad May 24 '19 at 12:26
  • @OverrockSTAR you may need to add an additional condition to getTextFromMessage at line 5: `} else if (bodyPart.isMimeType("text/html")) { result = org.jsoup.Jsoup.parse(message.getContent().toString()).text();` – Dylan Smith Jun 13 '19 at 15:00
  • 1
    I don't understand why they didn't provide `.getParts()` that we could iterate over and then determine which one we want. We could even do a filter. Instead we have to do 0, 1, 2, 3.... – romulusnr Apr 10 '20 at 04:26
25

Don't reinvent the wheel! You can simply use Apache Commons Email (see here)

Kotlin example:

fun readHtmlContent(message: MimeMessage) = 
        MimeMessageParser(message).parse().htmlContent

If email does not have html content, but it has plain content (you can check that by hasPlainContent and hasHtmlContent methods) then you should use this code:

fun readPlainContent(message: MimeMessage) = 
        MimeMessageParser(message).parse().plainContent

Java example:

String readHtmlContent(MimeMessage message) throws Exception {
    return new MimeMessageParser(message).parse().getHtmlContent();
}

String readPlainContent(MimeMessage message) throws Exception {
    return new MimeMessageParser(message).parse().getPlainContent();
}
grolegor
  • 1,260
  • 1
  • 16
  • 22
  • 2
    This is just brilliant! The java part does the trick perfectly and it's just simple and clean – Sammy Apr 11 '19 at 15:36
  • 1
    Give this man an award ! I've tried to implement what this library basically does for three days in vain. Thanks man! u re life saver :) – Ahmet Eroğlu Jun 17 '20 at 10:42
  • Be careful as the parse() method will load attachments in to memory. If you're only looking for messages that have attachments, or want to get attachment file names this will end up having to load all of the attachments in to memory to do so! – dukethrash Aug 26 '22 at 06:27
14

Below is method that will takes text from message in case bodyParts are text and html.

  import javax.mail.BodyPart;
  import javax.mail.Message;
  import javax.mail.internet.MimeMultipart;
  import org.jsoup.Jsoup;

  ....    
  private String getTextFromMessage(Message message) throws Exception {
    if (message.isMimeType("text/plain")){
        return message.getContent().toString();
    }else if (message.isMimeType("multipart/*")) {
        String result = "";
        MimeMultipart mimeMultipart = (MimeMultipart)message.getContent();
        int count = mimeMultipart.getCount();
        for (int i = 0; i < count; i ++){
            BodyPart bodyPart = mimeMultipart.getBodyPart(i);
            if (bodyPart.isMimeType("text/plain")){
                result = result + "\n" + bodyPart.getContent();
                break;  //without break same text appears twice in my tests
            } else if (bodyPart.isMimeType("text/html")){
                String html = (String) bodyPart.getContent();
                result = result + "\n" + Jsoup.parse(html).text();

            }
        }
        return result;
    }
    return "";
}

Update. There is a case, that bodyPart itself can be of type multipart. (I met such email after have written this answer.) In this case you will need rewrite above method with recursion.

Yuriy N.
  • 4,936
  • 2
  • 38
  • 31
  • `//without break same text appears twice in my tests` - This is because you are not differentiating between `multipart/alternative` and `multipart/mixed`. `multipart/alternative` means that the parts contain the same information, but in different representations. In this case, the user agent is expected to choose only one. See [here](https://tools.ietf.org/html/rfc2046) – hendalst Apr 29 '16 at 06:35
  • @hendlast Thank you. – Yuriy N. Apr 29 '16 at 12:50
  • Welcome. See below for an example of how to deal with this. In general (per the RFC) you should take the last element, although in this case plain text is preferred so looping through the body parts to find a plan text version is probably ideal. – hendalst Apr 29 '16 at 16:15
10

I don't think so, otherwise what would happen if a Part's mime type is image/jpeg? The API returns an Object because internally it tries to give you something useful, provided you know what is expected to be. For general purpose software, it's intended to be used like this:

if (part.isMimeType("text/plain")) {
   ...
} else if (part.isMimeType("multipart/*")) {
   ...
} else if (part.isMimeType("message/rfc822")) {
   ...
} else {
   ...
}

You also have the raw (actually not so raw, see the Javadoc) Part.getInputStream(), but I think it's unsafe to assume that each and every message you receive is a text-based one - unless you are writing a very specific application and you have control over the input source.

Raffaele
  • 20,627
  • 6
  • 47
  • 86
  • 2
    [`javax.mail.Message`](http://javamail.kenai.com/nonav/javadocs/index.html?javax/mail/Message.html) implements the `javax.mail.Part` interface – Raffaele Jun 29 '12 at 08:14
4

In my case I wanted the HTML to be exist also and I also searched for some already made utlity so I fixed mine using following code

import javax.mail.Message;
import org.apache.commons.io.IOUtils;
import javax.mail.internet.MimeUtility;
.....
String body = IOUtils.toString(
                 MimeUtility.decode(message.getInputStream(), "quoted-printable"),
                 "UTF-8"
              );
Youans
  • 4,801
  • 1
  • 31
  • 57
3

If you want to get text always then you can skip other types like 'multipart' etc...

  Object body = message.getContent(); 
    if(body instanceof String){
    // hey it's a text
    }
raggsss
  • 41
  • 5
JAVAGeek
  • 2,674
  • 8
  • 32
  • 52
0

My answer is extendeded version of Austin Answer but with one condition in the first method( getTextFromMessage() ).

Change: we should also check whether the MimeType is "text/html".

check lines ending with '//'**

private String getTextFromMessage(Message message) throws MessagingException, IOException {
    String result = "";
    if (message.isMimeType("text/plain")) {
        result = message.getContent().toString();
    } 

    else if (message.isMimeType("text/html")) { // **
        result = message.getContent().toString(); // **
    }

    else if (message.isMimeType("multipart/*")) {
        MimeMultipart mimeMultipart = (MimeMultipart) message.getContent();
        result = getTextFromMimeMultipart(mimeMultipart);
    }
    return result;
}

private String getTextFromMimeMultipart(
        MimeMultipart mimeMultipart)  throws MessagingException, IOException{
    String result = "";
    int count = mimeMultipart.getCount();
    for (int i = 0; i < count; i++) {
        BodyPart bodyPart = mimeMultipart.getBodyPart(i);
        if (bodyPart.isMimeType("text/plain")) {
            result = result + "\n" + bodyPart.getContent();
            break; // without break same text appears twice in my tests
        } else if (bodyPart.isMimeType("text/html")) {
            String html = (String) bodyPart.getContent();
            result = result + "\n" + org.jsoup.Jsoup.parse(html).text();
        } else if (bodyPart.getContent() instanceof MimeMultipart){
            result = result + getTextFromMimeMultipart((MimeMultipart)bodyPart.getContent());
        }
    }
    return result;
}
Vishal Patel
  • 554
  • 6
  • 15
0

You could use org.apache.commons.mail.util.MimeMessageParser

Java:

String htmlContent = new MimeMessageParser(message).parse().getHtmlContent();

Kotlin:

val htmlContent: String = MimeMessageParser(message).parse().htmlContent
Sabina Orazem
  • 477
  • 4
  • 12
0

Here is my code, I use in my IMAP android application. Its working.

GetTextFromMessage returns plain text or html string

Kotlin

    @Throws(IOException::class, MessagingException::class)
    private fun getTextFromMessage(message: Message): String {
        var result: String = ""
        if (message.isMimeType("text/plain")) {
            result = message.content.toString()
        }
        else if (message.isMimeType("multipart/*")) {
            val mimeMultipart =
                message.content as MimeMultipart
            result = getTextFromMimeMultipart(mimeMultipart)
        }
        else if(message.isMimeType("text/html")){
            result = message.content.toString()
        }
        return result
    }

    @Throws(IOException::class, MessagingException::class)
    private fun getTextFromMimeMultipart(
        mimeMultipart: MimeMultipart
    ): String {
        val count = mimeMultipart.count
        if (count == 0) throw MessagingException("Multipart with no body parts not supported.")

        val multipartRelated = ContentType(mimeMultipart.contentType).match("multipart/related")


        if(multipartRelated){
            val part = mimeMultipart.getBodyPart(0)
            val multipartAlt = ContentType(part.contentType).match("multipart/alternative")
            if(multipartAlt) {
                return getTextFromMimeMultipart(part.content as MimeMultipart)
            }
        }else{
            val multipartAlt = ContentType(mimeMultipart.contentType).match("multipart/alternative")
            if (multipartAlt) {
                for (i in 0 until count) {
                    val part = mimeMultipart.getBodyPart(i)
                    if (part.isMimeType("text/html")) {
                        return getTextFromBodyPart(part)
                    }
                }
            }
        }


        var result: String = ""
        for (i in 0 until count) {
            val bodyPart = mimeMultipart.getBodyPart(i)
            result += getTextFromBodyPart(bodyPart)
        }
        return result
    }

    @Throws(IOException::class, MessagingException::class)
    private fun getTextFromBodyPart(
        bodyPart: BodyPart
    ): String {
        var result: String = ""
        if (bodyPart.isMimeType("text/plain")) {
            result = bodyPart.content as String
        } else if (bodyPart.isMimeType("text/html")) {
            val html = bodyPart.content as String
            result = html
        } else if (bodyPart.content is MimeMultipart) {
            result =
                getTextFromMimeMultipart(bodyPart.content as MimeMultipart)
        }
        return result
    }
Boken
  • 4,825
  • 10
  • 32
  • 42
andermirik
  • 40
  • 4