11

I try to convert pdf file to word, excel and powerpoint. I already tried a lot of command like these:

soffice -env:UserInstallation=file:///$HOME/.libreoffice-headless/ --convert-to docx:"Microsoft Word 2007/2010/2013 XML" file.pdf
/usr/bin/soffice --headless --invisible --convert-to docx file.pdf
soffice --infilter="writer_pdf_import" --convert-to doc file.pdf

/usr/bin/libreoffice --headless --invisible --convert-to doc file.pdf
/usr/bin/soffice --headless --convert-to docx:"Microsoft Word 2007/2010/2013 XML" file.pdf

abiword --to=doc file.pdf
unoconv -f doc file.pdf
lowriter --invisible --convert-to doc 'file.pdf'

Always got this error message from soffice/libreoffice/unoconv:

:1: parser error : Document is empty
%PDF-1.7

And this one for abiword

Unable to init server: Could not connect: Connection refused

** (abiword:6477): WARNING **: clutter failed 0, get a life.
Unable to init server: Could not connect: Connection refused

With every command but abiword. I got a doc file with bad character inside. But never get a proper file.

I try to create a file converter so I only want command line method. Don't want to use someone API.

Thank you

Splinteer
  • 1,126
  • 4
  • 13
  • 28

2 Answers2

15

Managed to do it with soffice. I had to install this package: libreoffice-pdfimport And don't forget to use --infilter="writer_pdf_import"

Splinteer
  • 1,126
  • 4
  • 13
  • 28
  • Thanks, I was looking for a long time for the correct infilter option to PDFs. May I ask how you knew it? – Tom G. Jun 06 '20 at 03:35
  • @TomG. can't remember now but I did a lot of searches – Splinteer Jun 09 '20 at 14:21
  • 10
    Thanks a lot. It worked like charm. I used: `libreoffice --invisible --infilter="writer_pdf_import" --convert-to docx:"MS Word 2007 XML" input_file.pdf` – Ankur Thakur Jul 02 '20 at 15:13
  • 3
    It converts PDF into tons of text box in order to keep the layout. Any way to improve upon this? – Tony Tan Jul 25 '20 at 05:25
  • 3
    My problem is the same: the tons of textboxes Why it is not possible to convert to real doc, docx, odf in 2021 ? Or Libre Office open in writer with normal formatting instead of draw ? – Csaba Tenkes May 12 '21 at 10:08
1

Linux has a few apps that can import a pdf as an image: LibreOffice, Okular, Calibre.

But if you want editable text, then you need to install the pdf toolkit pdftk, then run the conversion utility pdf2txt. The terminal command is:

pdf2txt input.pdf output.txt

Thereafter, import the txt file into a wordpro, and complete the final editing/formatting.

TankorSmash
  • 12,186
  • 6
  • 68
  • 106
rob grune
  • 111
  • 3