7

I am trying to convert pdf book to asciidoc document.I have tried the following command:

pandoc -s s.pdf -t asciidoc -o example28.txt

I got "Unknown reader" problem.

q@q-ABRA-A5-V12-1:~/Downloads$ pandoc -s s.pdf -t asciidoc -o example28.txt
pandoc: Unknown reader: pdf
Pandoc can convert to PDF, but not from PDF.

How can I fix this or is there another way to convert from pdf to asciidoc?

my-lord
  • 2,453
  • 3
  • 12
  • 26
  • pandoc doesn't read pdfs, only produces them. but you could try `less s.pdf | pandoc -t asciidoc` – mb21 Sep 05 '18 at 12:51
  • When I try this command I get "pandoc: Unknown reader: plain" error. – my-lord Sep 05 '18 at 12:52
  • ah right, you leave the `-f`, it will default to markdown... but probably you want a dedicated tool anyway. but stackoverflow is probably the wrong place to ask for that. also depends on your plattform / needs. – mb21 Sep 05 '18 at 12:54
  • See also this more generic question: [Python module for converting PDF to text](https://stackoverflow.com/questions/25665/python-module-for-converting-pdf-to-text) which has many more answers. – Paul Rougieux Jun 08 '20 at 09:07

1 Answers1

7

Have you tried pdf2txt? https://pypi.org/project/pdfminer/ It's one of the tools provided there.

tidel
  • 158
  • 6
  • 6
    seems to go to HTML, and then you can use pandoc to go from HTML to asciidoc: `pdf2txt.py -t html input.pdf | pandoc -f html -t asciidoc` – mb21 Sep 05 '18 at 12:56
  • Thanks a lot. I have converted pdf to asciidoc but I have extra newline problem which is probably caused extra
    blocks on html.How can I fix this problem? From : https://i.imgur.com/QJ3Mx0n.png To:https://i.imgur.com/XoURhd9.png
    – my-lord Sep 06 '18 at 08:19
  • 1
    As of 2020, PDFMiner is not actively maintained. This is the community maintained fork: [pdfminer.six](https://github.com/pdfminer/pdfminer.six). – Paul Rougieux Jun 08 '20 at 08:58