How can I convert pdf to asciidoc using pandoc?

Question

I am trying to convert pdf book to asciidoc document.I have tried the following command:

pandoc -s s.pdf -t asciidoc -o example28.txt

I got "Unknown reader" problem.

q@q-ABRA-A5-V12-1:~/Downloads$ pandoc -s s.pdf -t asciidoc -o example28.txt
pandoc: Unknown reader: pdf
Pandoc can convert to PDF, but not from PDF.

How can I fix this or is there another way to convert from pdf to asciidoc?

pandoc doesn't read pdfs, only produces them. but you could try `less s.pdf | pandoc -t asciidoc` — mb21, Sep 05 '18 at 12:51
When I try this command I get "pandoc: Unknown reader: plain" error. — my-lord, Sep 05 '18 at 12:52
ah right, you leave the `-f`, it will default to markdown... but probably you want a dedicated tool anyway. but stackoverflow is probably the wrong place to ask for that. also depends on your plattform / needs. — mb21, Sep 05 '18 at 12:54
See also this more generic question: [Python module for converting PDF to text](https://stackoverflow.com/questions/25665/python-module-for-converting-pdf-to-text) which has many more answers. — Paul Rougieux, Jun 08 '20 at 09:07

score 7 · Accepted Answer · answered Sep 05 '18 at 11:55

7

Have you tried pdf2txt? https://pypi.org/project/pdfminer/ It's one of the tools provided there.

answered Sep 05 '18 at 11:55

tidel

6

seems to go to HTML, and then you can use pandoc to go from HTML to asciidoc: `pdf2txt.py -t html input.pdf | pandoc -f html -t asciidoc` – mb21 Sep 05 '18 at 12:56
Thanks a lot. I have converted pdf to asciidoc but I have extra newline problem which is probably caused extra
blocks on html.How can I fix this problem? From : https://i.imgur.com/QJ3Mx0n.png To:https://i.imgur.com/XoURhd9.png – my-lord Sep 06 '18 at 08:19
1

As of 2020, PDFMiner is not actively maintained. This is the community maintained fork: [pdfminer.six](https://github.com/pdfminer/pdfminer.six). – Paul Rougieux Jun 08 '20 at 08:58

1 Answers1