Convert PDF to HTML using python and pdfkit

Question

On this site Adobe write about conversion from pdf to html using pdfkit

They use pdfkit.from_pdf(...) method.

This script uses the ‘pdfkit’ library to convert the PDF file to HTML. To use this script, you will need to install the ‘pdfkit’ library...

When I want to use this method I have error

Traceback (most recent call last):
  File "C:\TestPdfToHtml\script.py", line 7, in <module>
    html_file = pdfkit.from_pdf(pdf_file, "my_html_file.html")
                ^^^^^^^^^^^^^^^
AttributeError: module 'pdfkit' has no attribute 'from_pdf'. Did you mean: 'from_url'?

How can I resolve this problem?

Below is the full script

import pdfkit
# Read the PDF file
pdf_file = open('test2.pdf', 'rb')
# Convert the PDF to HTML
html_file = pdfkit.from_pdf(pdf_file, "my_html_file.html")
# Close the PDF file
pdf_file.close()

Hello, based on the library description [Wkhtmltopdf python wrapper to convert html to pdf](https://github.com/JazzCore/python-pdfkit), are you sure that this is the correct tool for doing this kind of conversion?, I mean `convert from pdf to html`. — Franco Gil, Mar 16 '23 at 13:49
[That](https://www.adobe.com/acrobat/hub/how-to/how-to-convert-pdf-to-html) seems to be the only page on the entire internet claiming that `pdfkit` has a `from_pdf()` function. A thing you can try is seeing if its `from_file()` function (which exists) happens to open a PDF, something I would not bet on. — tevemadar, Mar 16 '23 at 13:59
Related: https://stackoverflow.com/questions/34837707/how-to-extract-text-from-a-pdf-file — tevemadar, Mar 16 '23 at 14:02
In documentation of package ```pdfkit``` are only 3 functions, from_string/file/html and the doc says nothing about conversion pfd to html, maybe adobe is trolling... — Duzy, Mar 16 '23 at 14:06

score -4 · Answer 1 · answered Mar 16 '23 at 14:17

-4

Maybe the newer version of pdfkit does not support pdfkit.from_pdf. You can try pdfkit.from_file()

pdfkit.from_file(pdf_file, html_file)

Hope this helps.

answered Mar 16 '23 at 14:17

PforPython

58
7

3

Downvote. I'm pretty sure you didn't try this. It doesn't work. When I try this with an actual PDF input file (even an extremely simple one generated by pdfkit itself), I get: `wkhtmltopdf exited with non-zero code 1. error: Exit with code 1, due to unknown error.` I'm pretty sure pdfkit just can't do this. I wonder why Adobe posted that nonsense... – jcsahnwaldt Reinstate Monica May 24 '23 at 17:36
This doesn't appear to work, from_file appears to expect a html file as an input and a pdf file as an output – Richard Stokes Jun 05 '23 at 08:39

Convert PDF to HTML using python and pdfkit

1 Answers1