4

I'm looking for a Java library which can do the following:

parse emails in *.eml or *.msg format for attachments of type DOC,DOCX,JPEG,PNG,GIF,TXT,XLS,XLSX,PPT,PDF and convert the attachmens to the TIFF format.

It can be either open source or a comercial library. Alternatively I'm looking for command line tools for linux doing this. We already tried open office, but there are too many problems with some document formats.

UPDATE:

What I found out by research up to now:

For parsing emails and extracting attachments, JavaMail (http://www.oracle.com/technetwork/java/javamail/index.html) is a good choice.

For converting documents, JodConverter (http://code.google.com/p/jodconverter/) is a confortable library. However it's only a wrapper for open office, so if there are issues with open office (and I do have often trouble with openoffice) to convert a document, you will have them also with JodConcerter.

In conclusion I had no luck (up to now) to find any document conversion library implemented in native java, which handels all common document formats, neither open source or even commercial. It seems to be a real market gap.

markus
  • 6,258
  • 13
  • 41
  • 68
  • I know there are tools that will go from image files to TIFF right away, so you're covered there. Now, you can go from DOC to ODF to PDF to TIFF. Similarly with XLS and PDF. As for stringing those tools together.... –  Sep 05 '11 at 08:14
  • From the featureset this library http://www.coolutils.com/TotalMailConverter does what I need, unfortunally it's not java. – markus Sep 05 '11 at 08:39
  • I found somthing at http://www.artofsolving.com/opensource/jodconverter/adoption but it looks a bit outdated, I'll check it out – markus Sep 05 '11 at 16:24
  • seems to be continued at http://code.google.com/p/jodconverter/ – markus Sep 05 '11 at 16:49

4 Answers4

2

RainbowPDF may fit: its a commercial server based conversion tool with Java API.

If you've got a Windows server, have a look at NEEVIA Document Converter Pro. It has some mail functionality.

Apace POI is an interface to read the content of Microsoft Office documents. You will have to code the image generating and layouting components on your own. Nervertheless it reads Outlook MSG format.

ChrisGer
  • 71
  • 5
  • I checked out the RainbowPDF trial. Expensive but nice. For my requirements it has one disadvantage, it cannot convert to TIFF format on linux. – markus Sep 09 '11 at 15:18
  • Despite the note on their website (TIFF, PNG, JPG windows only) I got this answer from support: "Rainbow PDF Server Based Converter can run on Solaris 8,9, & 10 and is also able to do conversions to PNG & JPEG". So either ask the support for linux and TIFF or get the trial. – ChrisGer Sep 15 '11 at 09:43
  • I tested the trial, and also contacted their support. They told me that they also support debian. TIFF for linux is frequently asked feature, but not on their roadmap yet. In conclusion it's nice software but expensive if you want to use it in a sas environment. – markus Sep 15 '11 at 10:09
1

Apache POI - the Java API for Microsoft Documents. However I don't know how to easily convert parsed document to TIFF.

Tomasz Nurkiewicz
  • 334,321
  • 69
  • 703
  • 674
  • well this library is only for microsoft documents and does not parse attachments out of an email. – markus Sep 05 '11 at 08:40
0

May be a mix of different approaches could be useful? Depending on your requirements, could be possible to use several libraries to convert all the formats you need to manage: Microsoft Office, Adobe PDF, some different image formats and simple text files.

I mean, you can create a process that, depending on the type of the file extracted (using Java Mail), you could recognize what kind of format the file has and continue processing with the right conversion mechanism using the suitable library. Then you will idenfity if a file it's an image to convert, try Java Advanced Imaging, if it's a Microsoft Office file, try Apache POI and so on. For managing PDF files, you can try Apache PDFBox it's another good and opensource solution.

By the way, if you are looking not only for a Java approach, may be this thread may help you.

I don't know if there are better commercial solutions than @ChrisGer commented.

Community
  • 1
  • 1
Nacho Cougil
  • 552
  • 6
  • 15
-1

Do not waste your time looking at Apache POI, as it can only parse the content of the Office files but is not suitable for rendering it.

Since there are OpenOffice servers available, I suggest you do this. I also know you can easily use DCOM to talk with Microsoft Office apps, maybe a Java->DCOM bridge is more up to the task. However, this is not even recommended by Microsoft (so I suppose the JodConverter thing is equally unstable).

parasietje
  • 1,529
  • 8
  • 36