3

I am using the great tool wkhtmltopdf to render HTML into nice PDF documents (executed from PHP). I now need to know if the HTML page I will render will fit onto "one page" in the PDF. I found out wkhtmltopdf will render ~ 1100px height on one page with my current settings.

I came up with the idea to use wkhtml2image first. I can then get the image dimensions which works fine.

Unfortunately the rendering to an image takes a long time and since I have to do it in a loop I'm looking for a faster way to do this.

Do you know any tool that will render HTML with webkit and give me back the dimensions of the resulting document? It might also be possible to use JavaScript as a helper (write dimensions into a div that can be extracted after) but I can't find a tool which gets the job done.

Any ideas or alternative approaches?

halfer
  • 19,824
  • 17
  • 99
  • 186
Niksac
  • 770
  • 1
  • 10
  • 21
  • An idea: You could invalidate all the images in the document and possibly render it faster. Did you try just substituting all occurrences of jpeg/gif/png with "invalid"? Just an idea. – Janus Troelsen Mar 31 '12 at 18:37
  • @Janus - that could make the layout to break apart, or what if a whole page is one image, i.e. a graph... not a good idea... – Michal Mar 31 '12 at 18:39
  • you have to patch wkhtmltopdf and re-patch QT to be able to that, otherwise it won't work. I'm assuming that's a lot of work. There is at least one web service based on wkhtmltopdf that does what you want. It has an API at http://www.htm2pdf.co.uk/html-to-pdf-api which could help you. – user1914292 Apr 13 '13 at 16:49

1 Answers1

1

As far as I know there is no reliable way how to count the pages in advance of generating the PDF file itself. This is due to the way how WebKit renders lines and how the pages are being broken or something (more info here).

However, you could use another tool to count the pages in a already generated PDF file.

You could also look @ my other extensive post regarding WKHTMLTOPDF here HTML2PDF in PHP - convert utilities & scripts - examples & demos. Maybe you find useful info there and my code there also features page counting on a already generated PDF file (in the process of encrypting it - a FPDF/FPDI implementation).

Community
  • 1
  • 1
Michal
  • 3,262
  • 4
  • 30
  • 50
  • hi, this is what i am doing with wkhtml2image. Unfortunately it is too slow. I would love to find a tool that does the rendering with no output but the height in pixels. – Niksac Mar 31 '12 at 19:08
  • no, you are converting the content to bitmap data, that's not the same... and you could maybe have wkhtmltopdf rendered the page in low DPI, that might improve the speed... – Michal Mar 31 '12 at 19:13
  • hi, i just tried this out. I put a lot of effort into making wk as fast as possible. Here are the things i tried: Disabled images, lowered dpi quality, disabled javascript, disabled pdf compression. Also i placed the source and the output file on a ramdisk. After all this optimization it runs roughly at the same speed as before. After being executed there is a lag with no output at the beginning. As soon as the first output comes across (loading pages 1/6) things go pretty fast. Unfortunately it's still to slow for my purposes. – Niksac Apr 01 '12 at 11:38
  • well it's hard to tell you anything more about this issue... this depends on your server config, hardware, on the docs you are working with and 1000 other issues... – Michal Apr 01 '12 at 11:47