48

Surely, I am the 100th user who is asking this but after I have searched through similar topics here and on other websites I still cannot find what I need.

I like to have a simple command line tool for my GNU/Linux which converts .doc(x) files to .pdf BUT the output should look the same as the original.

LibreOffice doesn't seem like a good choice for this because it does not convert well in some cases. I have found a website freepdfconvert.com which does the job very well, but I cannot upload any sensitive files since it is a big risk. I don't say they would do anything bad with them but it is how it is.

If I can't find any good tool maybe I will have to write one myself.

Matthias Braun
  • 32,039
  • 22
  • 142
  • 171
  • I never have done before but I googled and I found this video tutorial: http://www.youtube.com/watch?v=RzxwJAeFMSc It uses an application called [AbiWord](http://www.abiword.org/). There are a lot of posts on this website to convert doc to pdf but I didn't know if any are 1:1. I'm sure you can do your app. I recommend you begin here: https://stackoverflow.com/questions/6011115/doc-to-pdf-using-python. – MarcoGarzini Jan 18 '14 at 11:52

2 Answers2

61

Unfortunately there are no Linux-based guaranteed 1-to-1 convertors for Word (doc/docx) to PDF. This is because Word, a Microsoft product, uses a proprietary format that changes slightly with every release. As it was not traditionally a publicly documented format and Microsoft does not port Word/Office to Linux (nor ever will) then you must rely upon reverse engineered third party tools for older formats (doc) and proper interpretation of the Office Open XML format by third party developers.

We found the best open source solution is LibreOffice (which was forked from OpenOffice.org, which itself was called Star Office before it was open sourced). It is much more actively developed than AbiWord, as another answer suggested.

The usage from the command line is simple and well documented with plenty of examples:

soffice --headless --convert-to pdf filename.doc

Or also you can use libreoffice instead of soffice on newer versions.

Charles Goodwin
  • 6,402
  • 3
  • 34
  • 63
  • This is the best answer here and should be accepted! Also `lowriter` which may be the same as `soffice`. Is it? – Léo Léopold Hertz 준영 Jun 10 '16 at 17:07
  • More detailed examples about the "headless" use of LibreOffice can be seen here: http://stackoverflow.com/a/30465397/359307 – Kurt Pfeifle Dec 14 '16 at 14:47
  • 2
    I think this has changed somewhat - docx and xlsx are part of Office Open XML[1] an open format that is pretty well documented. The tools to convert these to PDF are still few and far in between. [1] https://en.wikipedia.org/wiki/Office_Open_XML – Eric Kigathi Apr 20 '17 at 15:48
  • On Mac `brew install --cask libreoffice` then `soffice --headless --convert-to pdf *.odt` – Ax_ Apr 16 '22 at 22:40
  • This was wrong the day it was posted. `docx` is a ZIP package containing well defined XML documents. It was introduced in 2007, 7 years before this answer. The reason PDF is so hard to work with is because it's *NOT* a document format, it's a set of Postscript instructions. Postscript doesn't even have tables. Converting to PDF is essentially a print operation. – Panagiotis Kanavos Jul 19 '22 at 16:45
27

There is also Pandoc.

Pandoc, mainly known for its Markdown-capable processing goodness (for outputting HTML, LaTeX, PDF, EPUB and what-not) in recent months has gained a rather well-working capability to process DOCX input files.
(NOTE: Pandoc only works for DOCX, not for DOC files.)

For its PDF output to work, it requires a working LaTeX installation (with either or all of pdflatex, lualatex and xelatex included). In this case the following simple command should work:

pandoc -o output.pdf -f docx input.docx

Note however, that the output layout and font styles now will not look at all similar to what it would look if you exported the DOCX from Word to PDF. It will be using the styles of a default LaTeX document.

You can influence the output style of the LaTeX-generated PDF by using a custom template file like this...

pandoc                              \
  -o output.pdf                     \
  -f docx                           \
 --template=my-latex-template.tmplt \
   input.docx

...but this is a feature more for Pandoc/LaTeX experts to use than for beginners.

Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345
  • @RinoTorino: The version of Pandoc I'm using, the most recent one, v1.15.1.1, can read and write DOCX as well as ODT. – Kurt Pfeifle Nov 15 '15 at 21:18
  • 3
    pandoc can not convert from doc file, it need docx. – tumbudu May 05 '16 at 05:31
  • 1
    @knocker: I didn't say it works for DOC, I only mentioned DOCX. But admittedly, this could easily be overlooked. Thanks for the hint -- I'll make it more explicit. – Kurt Pfeifle May 05 '16 at 08:13