0

This is what I'm getting with javax.mail.BodyPart.writeTo(..):

Content-Type: text/plain; charset=windows-1252 
Content-Transfer-Encoding: quoted-printable  

some text *again*  

=97 
Bobby   

On Wed, Feb 8, 2012 at 11:51 AM, Alex Johnson <alex@example.com> wrot= 
e:  

> let's try again 
> and again

I want to clean this text and convert it to UTF-8, in order to receive exactly this:

some text *again*

--
Bobby

I'm sure I'm not the first who is facing this problem/task. Do you know any Java libraries that can help?

yegor256
  • 102,010
  • 123
  • 446
  • 597

1 Answers1

1

Removing the included message is just a string manipulation problem, I'm sure you can figure that out yourself using regular expressions or whatever.

You can access the content of the body part, without the headers, and with the charset encoding handled for you, simply by calling the getContent method.

Is that what you're looking for?

Bill Shannon
  • 29,579
  • 6
  • 38
  • 40
  • Well, that "fiture that out yourself" part is the one I'm worried about. I still hope to find some library for this purpose... – yegor256 Feb 08 '12 at 21:32
  • In general, that problem can be very difficult, because there are no standards for how the included message is formatted in the text of the new message. Often, but not always, each line of the text of the included message will be preceded by "> ". Often, but not always, the included message will start with a distinctive line as in your example. But different mailers will format that line differently. – Bill Shannon Feb 08 '12 at 22:34
  • (continuing) In the end, you're going to have to figure out how "perfect" your solution needs to be and come up with some heuristics that work well enough for the cases you care about. Again, java.util.regex might help. You'll probably find it easier to read the text a line at a time (use a StringReader) and match each line with a pattern, copying the lines you want to keep and throwing away the others. – Bill Shannon Feb 08 '12 at 22:35