4

On this site Adobe write about conversion from pdf to html using pdfkit

They use pdfkit.from_pdf(...) method.

This script uses the ‘pdfkit’ library to convert the PDF file to HTML. To use this script, you will need to install the ‘pdfkit’ library...

When I want to use this method I have error

Traceback (most recent call last):
  File "C:\TestPdfToHtml\script.py", line 7, in <module>
    html_file = pdfkit.from_pdf(pdf_file, "my_html_file.html")
                ^^^^^^^^^^^^^^^
AttributeError: module 'pdfkit' has no attribute 'from_pdf'. Did you mean: 'from_url'?

How can I resolve this problem?

Below is the full script

import pdfkit
# Read the PDF file
pdf_file = open('test2.pdf', 'rb')
# Convert the PDF to HTML
html_file = pdfkit.from_pdf(pdf_file, "my_html_file.html")
# Close the PDF file
pdf_file.close()
Nate Anderson
  • 18,334
  • 18
  • 100
  • 135
Duzy
  • 79
  • 2
  • 8
  • What does the documentation of package `pdfkit` say? – Jorj McKie Mar 16 '23 at 13:46
  • Hello, based on the library description [Wkhtmltopdf python wrapper to convert html to pdf](https://github.com/JazzCore/python-pdfkit), are you sure that this is the correct tool for doing this kind of conversion?, I mean `convert from pdf to html`. – Franco Gil Mar 16 '23 at 13:49
  • 1
    [That](https://www.adobe.com/acrobat/hub/how-to/how-to-convert-pdf-to-html) seems to be the only page on the entire internet claiming that `pdfkit` has a `from_pdf()` function. A thing you can try is seeing if its `from_file()` function (which exists) happens to open a PDF, something I would not bet on. – tevemadar Mar 16 '23 at 13:59
  • Related: https://stackoverflow.com/questions/34837707/how-to-extract-text-from-a-pdf-file – tevemadar Mar 16 '23 at 14:02
  • 2
    In documentation of package ```pdfkit``` are only 3 functions, from_string/file/html and the doc says nothing about conversion pfd to html, maybe adobe is trolling... – Duzy Mar 16 '23 at 14:06

1 Answers1

-4

Maybe the newer version of pdfkit does not support pdfkit.from_pdf. You can try pdfkit.from_file()

pdfkit.from_file(pdf_file, html_file)

Hope this helps.

PforPython
  • 58
  • 7
  • 3
    Downvote. I'm pretty sure you didn't try this. It doesn't work. When I try this with an actual PDF input file (even an extremely simple one generated by pdfkit itself), I get: `wkhtmltopdf exited with non-zero code 1. error: Exit with code 1, due to unknown error.` I'm pretty sure pdfkit just can't do this. I wonder why Adobe posted that nonsense... – jcsahnwaldt Reinstate Monica May 24 '23 at 17:36
  • This doesn't appear to work, from_file appears to expect a html file as an input and a pdf file as an output – Richard Stokes Jun 05 '23 at 08:39