19

I am trying to create a web service that will convert a doc/docx to png format.

The problem I seem to have is I can't find any library or something close to it that will do what I need, considering I am looking for something free and not Office dependent (the server where the app will run does not have Office installed).

Is there anything that can help me in obtaining this? Or must I choose between using something office dependant (like Interop - which btw I read is really bad to be used on server) or something that isn't free?

Thanks

Mikhail
  • 9,186
  • 4
  • 33
  • 49
relysis
  • 1,065
  • 1
  • 14
  • 27
  • 3
    The problem with what you want to do is that a png is a picture; a Word document is (a)a string of binary characters or (b)a zip package of XML files. In either case, the Word application is required to lay out the page so that the document is visible, as a document, with all the "bells and whistles" (formatting, line & page breaks, headers, footers, etc.). The only way I know of to make a "picture" of a Word document is to display it on a monitor then make a screen-shot... of EACH page. It might be better to convert to PDF format then work from that? – Cindy Meister Oct 19 '15 at 16:51
  • 1
    Damn, that should be closed - we do NOT do product recommendations here. – TomTom Jan 21 '16 at 14:09
  • @TomTom : i'm not looking for product ! there is a lots of product on google i can find!! – Cyrus Raoufi Jan 21 '16 at 14:18
  • @CyC0der You are not? Well, the QUESTION is. Did you bother reading it? "Is there anything that can help me in obtaining this? Or must I choose between using something office dependant (like Interop - which btw I read is really bad to be used on server) or something that isn't free?" -. that is looking for a product recommendation. – TomTom Jan 21 '16 at 14:24
  • @TomTom o! damn sorry thats my mistake ... – Cyrus Raoufi Jan 21 '16 at 14:28
  • The point is here to find a way to do this on the server without needing Office, and if this is possible. As you can see in my answer this seems indeed possible and not needing proprietary tools. For file formats it's rare to find libraries for arbitrary conversions, because you usually lack a good middle representation, although tools like Pandoc show this is not completely impossible. I used PDF as a middle representation here, you can go from there. – LaPingvino Jan 29 '16 at 14:25

6 Answers6

6

I know this is most likely not what you want, since it is not free.

But Aspose can do what you need.

Spire.doc too. Again, not free.

Aspose:

string exeDir = Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location) + Path.DirectorySeparatorChar;
string dataDir = new Uri(new Uri(exeDir), @"../../Data/").LocalPath;

// Open the document.
Document doc = new Document(dataDir + "SaveAsPNG.doc");

//Create an ImageSaveOptions object to pass to the Save method
ImageSaveOptions options = new ImageSaveOptions(SaveFormat.Png);
options.Resolution = 160;

// Save each page of the document as Png.
for (int i = 0; i < doc.PageCount; i++)
{
    options.PageIndex = i;
    doc.Save(string.Format(dataDir+i+"SaveAsPNG out.Png", i), options);
}

Spire.doc (WPF):

using Spire.Doc;
using Spire.Doc.Documents;

namespace Word2Image
{
    /// 
    /// Interaction logic for MainWindow.xaml
    /// 
    public partial class MainWindow : Window
    {
        public MainWindow()
        {
            InitializeComponent();
        }

        private void button1_Click(object sender, RoutedEventArgs e)
        {
            Document doc = new Document("sample.docx", FileFormat.Docx2010);
            BitmapSource[] bss = doc.SaveToImages(ImageType.Bitmap);
            for (int i = 0; i < bss.Length; i++)
            {
                SourceToBitmap(bss[i]).Save(string.Format("img-{0}.png", i));
            }
        }

        private Bitmap SourceToBitmap(BitmapSource source)
        {        

            Bitmap bmp;
            using (MemoryStream ms = new MemoryStream())
            {
                PngBitmapEncoder encoder = new PngBitmapEncoder();
                encoder.Frames.Add(BitmapFrame.Create(source));
                encoder.Save(ms);
                bmp = new Bitmap(ms);
            }
            return bmp;
        }
    }
}
Gertsen
  • 1,078
  • 20
  • 36
  • thank you but i'm looking for free way , these library are very expensive in my location – Cyrus Raoufi Jan 21 '16 at 14:20
  • Unfortunately they are expensive everywhere :-( I would like to do the same thing as you, but so far, I have not found any free solutions. - I'm beginning to doubt any free solution exists, and that the closest thing to a free solution is using Office Interop, and that's no good for a service, since it requires alot of resources and is rather slow. – Gertsen Jan 21 '16 at 14:32
  • is it possible with interop ? its better then nothing ! – Cyrus Raoufi Jan 21 '16 at 14:36
  • 1
    It's not officially supported, so YMMV, but I think it is possible, like in this case: http://stackoverflow.com/questions/24830027/issue-with-converting-doc-to-png – Gertsen Jan 21 '16 at 14:48
  • I am trying to convert a docx with arabic text in it and Spire.doc is making a complete mess of it :( Probably it only works fine for languages with LTR orientation :( – sohaiby Aug 12 '17 at 17:04
6

Yes, such complex file types conversions are usually well implemented in the specialized / 3-rd party libraries (like in the aforementioned one), or, for example, in the DevExpress Document Automation:

using System;
using System.Drawing.Imaging;
using System.IO;
using DevExpress.XtraPrinting;
using DevExpress.XtraRichEdit;

using(MemoryStream streamWithWordFileContent = new MemoryStream()) {
    //Populate the streamWithWordFileContent object with your DOC / DOCX file content

    RichEditDocumentServer richContentConverter = new RichEditDocumentServer();
    richContentConverter.LoadDocument(streamWithWordFileContent, DocumentFormat.Doc);

    //Save
    PrintableComponentLink pcl = new PrintableComponentLink(new PrintingSystem());
    pcl.Component = richContentConverter;
    pcl.CreateDocument();

    ImageExportOptions options = new ImageExportOptions(ImageFormat.Png);

    //Paging
    //options.ExportMode = ImageExportMode.SingleFilePageByPage;
    //options.PageRange = "1";

    pcl.ExportToImage(MapPath(@"~/DocumentAsImageOnDisk.png"), options);
}
Mikhail
  • 9,186
  • 4
  • 33
  • 49
  • new version of devexpress like this: ``public static void docxToImage(string inpath ) { var sourceServer = new RichEditDocumentServer(); sourceServer.LoadDocument(inpath); var pl= new PrintableComponentLink(); pl.PrintingSystemBase =new PrintingSystemBase(); pl.Component = sourceServer; pl.CreateDocument(true); var options = new ImageExportOptions(ImageFormat.Png); pl.ExportToImage("image.png", options); }`` – bh_earth0 Mar 28 '18 at 12:43
6

Install LibreOffice on your server. The latest versions of LibreOffice have a command line interface that will work for saving your document as a PDF. (libreoffice --headless --convert-to pdf filename.doc[x])

Then use e.g. imagemagick or for example the LibreOffice Draw conversion options to convert the PDF to an image.

LaPingvino
  • 2,803
  • 2
  • 18
  • 17
5

I think the best way to do it for free and without an office client will require a 3-step process: Convert doc/docx to html - Convert html to PDF - convert PDF to PNG.

Open XML will get you past the first post. This does not require any installed Office clients and there is a really good resource that can help you put together the code to solve this first step (http://openxmldeveloper.org/). However I don't think it can solve the PDF/PNG problem. Hence,

iTextSharp will do the free PDF conversion for you. But it can't go from PDF to PNG. So lastly,

GhostScript.NET will get you over the finish line.

These are the links I collated which seem to be the most useful:


I get the feeling no one has ever done this using free tools. If you succeed, please share your code on Github :)

Community
  • 1
  • 1
Balah
  • 2,530
  • 2
  • 16
  • 24
2

Consider dynamic convertion docx to html using powertools (or even using office VSTO, it will be fast) and then using wkhtmltopdf (directly or with pechkin or similar) to render png from html. I've wrote why wkhtmltopdf is better then for ex. iTextSharp here. By the way, I think that the best commercial library to work with doc/docx is TxText - its really awesome, you can do anything you want.

Community
  • 1
  • 1
SalientBrain
  • 2,431
  • 16
  • 18
2

If it's an option for you to install a PNG virtual printer on your system you could consider some software as PDFCreator (print to PNG, too), or something similar.

MarcoM
  • 1,093
  • 9
  • 25