60

Is there any PHP PDF library that can replace placeholder variables in an existing PDF, ODT or DOCX document, and generate a PDF file as the end result, without screwing up the layout?

Requirements:

  • Needs no 3rd party web service

  • Ability to run on shared web hosting would be ideal (no binary installations / packages required)

Mind you, a library that is able to load an existing PDF file and insert text programmatically at a specific position is not enough for my use case.

As far as my research shows, there is no library that can do this:

  • TCPDF can only generate documents from scratch

  • FPDI can read existing PDF templates, but can only add contents programmatically (no template variable replacement)

  • There are various DOCX/ODT template libraries out there but they don't output PDF

PHPDOCx claims to be able to do exactly what I need - but they don't offer a trial version and I'm not going to buy a cat in a bag, especially not when there seems to be no other product on the web that does this. I find it hard to believe they can do this without problems - if you have successfully done this using the product, please drop a line here.

Am I overlooking something?

Is there a way to do this using PDF forms? I am creating the source documents in OpenOffice 3.

I may be able to use standard Linux commands (pdftk is available for example, trying that out right now.)

Update: *Argh!* I was called out of the office and the bounty expired in the meantime. Starting a new bounty: As far as my testing shows, no solution works for me perfectly yet.

Update II: I will be looking the pdftk approach soon, but I am also starting another bounty for one more round of collecting additional input. This question has now seen 1300 rep points in bounties, must be some kind of a record :)

Pekka
  • 442,112
  • 142
  • 972
  • 1,088
  • 1
    Which size of placeholder variables are at stake? If it's just single words (or < 30 bytes) and not lengthy paragraphs, then placeholer preparation and a simple search and replace approach would do. – mario Dec 23 '10 at 14:32
  • @mario they may be multi-line paragraphs but I could see to it that they are split into separate lines. – Pekka Dec 23 '10 at 14:34
  • Is there any reason why you need to parse out placeholders? What if there was just space in the PDF for the generated content. Probably wouldn't work if you don't know the length of the content though. – Jeremy Dec 23 '10 at 21:35
  • @Pekka, why can't you use a cheap VPS (www.lowendbox.com) and remove the limitation of not able to install packages/libraries? That would ease out the stuff. Is there a reason for shared hosting? – shamittomar Dec 26 '10 at 15:25
  • @shamittomar it's for a client, and I want to keep the technical and bookkeeping overhead as small as possible. A VPS would indeed make this easy but add a whole new parameter to the equation (A server needs administration, installation, continuous security updates, .... It can be down while the site needing the PDF services is still up...) that I want to avoid at all costs – Pekka Dec 26 '10 at 16:01
  • Pekka, if the PDF is a big one and text replacement takes too long, won't the server kill your PHP script. You are better off offloading the job to a daemon running on the server. – BZ1 Jan 28 '11 at 05:25
  • @Pekka - Small note, FPDI can sit on top of TCPDF as well as FPDF IIRC, so via that TCPDF is capable of working with templates in the FPDI fashion. – Orbling Jan 30 '11 at 14:50

10 Answers10

16

This is not very practical, but for completeness: If you already have an ODT template, then you might very well retain that as template. Modifying the OpenDocument content.xml and replacing placeholders therein is pretty simple. If so, you could use unoconv or pyodconverter to transform the ODT into a final PDF.

unoconv -f pdf -o final.pdf template.odt

Very obviously this requires a full OpenOffice setup (UNO and Writer) on the webserver. And obviously not every webhoster would go with that! haha. Even if it's simple on any Debian or Fedora setup. The execution speed would probably not be stellar either. But then it might be the cleanest approach, since OOo governs both formats way better than any PHP class ever could.

mario
  • 144,265
  • 20
  • 237
  • 291
  • Yeah, this would be my favourite solution but I'm fairly sure the web host doesn't allow it. Still, for anybody who can use it, this is the cleanest way – Pekka Dec 23 '10 at 14:52
  • Just need a VPS to do it. Hmmm nice solution. +1. – shamittomar Dec 26 '10 at 15:23
  • Besides, you need `exec()`, which is quite likely not available on a shared host. – Bram Schoenmakers Dec 27 '10 at 23:01
  • @BramSchoenmakers: Depends. I have an extremely inexpensive and professional shared hoster. Uses suexec, fastcgi, suhosin instead of safe_mode etc. And they would most likely install Openoffice for me. But of course, it's a different story with mass hosters. | Also in this case you could use a socket connection to UNO. – mario Dec 27 '10 at 23:06
  • Persuading the client to stick with odt and stack up a separate server for publishing pdf files could be nice too. – naugtur Jan 28 '11 at 12:52
  • where can i find content.xml? – Ali Nouman Dec 04 '14 at 13:05
  • @shamittomar where could i find content.xml? – Ali Nouman Dec 05 '14 at 05:17
  • @Human love: Please avoid using commets for support/research requests. Google the [ODT format](http://de.wikipedia.org/wiki/OpenDocument) to find out about the structure and XML meta/content entrails. – mario Dec 05 '14 at 09:59
13

Pekka,

I looked in to this previously, I think you can use pdftk (a command line utility), to fill in a PDF form using FDF/XFDF data files, which you could easily generate from within PHP. That was the best option I've seen so far, though there may well be a native library.

pdftk is quite useful in general, worth having a look at.

Update: Have a look here: http://php.net/manual/en/book.fdf.php

Orbling
  • 20,413
  • 3
  • 53
  • 64
  • I will check out whether I can get `pdftk` running on the shared webhost. It seems ideal – Pekka Dec 11 '10 at 12:42
  • @Pekka It builds fairly well, I've had it running on a fair few hosting configurations. The dependencies are pretty standard. – Orbling Dec 11 '10 at 12:46
  • 1
    To complement this answer: http://stackoverflow.com/questions/1389964/merge-fdf-data-into-a-pdf-file-using-php – RabidFire Dec 23 '10 at 16:18
  • @Pekka: Was this approach a non-starter then? – Orbling Dec 23 '10 at 19:35
  • @Orbling sorry, I haven't got around to trying this out yet. Will provide feedback soon – Pekka Dec 29 '10 at 10:48
  • @Pekka: NP, noticed the bounty added to it and thought that meant existing efforts had proved fruitless. – Orbling Dec 29 '10 at 11:29
  • 1
    I use PDFTK for online certs and label printing, its quite handy but unfortunately many shared hosting providers wont install it (they'll try to get you to upgrade to your own dedicated server). However if you can find a host willing to use it, check out pdftk-php: https://github.com/andrewheiss/pdftk-php, it makes it very easy to take your data from a database and insert into a pdf form. For best results, create your form in LiveCycle (included with Acrobat Pro) as opposed to Acrobat – WebChemist Jan 27 '11 at 00:21
  • @Orbling it will take a few more days for me to work this out, but I'm marking this accepted because I have pdftk on the server. Cheers! – Pekka Feb 02 '11 at 11:38
  • @Pekka: Glad it is of help. PDF generation on-the-fly is a right pain. I work in the print industry, *I understand*. ;-) – Orbling Feb 02 '11 at 12:38
5

Have you considered using something like XSL:Formatting Objects (XSL:FO)? Basically they're XML documents that are processed and turned into PDFs. Doing string - or better, DOM - replacements within that should be pretty simple. It supports embedding images, links, annotations, etc.

It's not PHP but there are a number of PHP wrappers for it along with ways of using it via exec, etc. Not an ideal but it takes care of the template portion completely. For some more info: http://techportal.inviqa.com/2009/12/16/transforming-xml-with-php-and-xsl/

There's an implementation available as an Apache project - http://xmlgraphics.apache.org/fop/

Shog9
  • 156,901
  • 35
  • 231
  • 235
CaseySoftware
  • 3,105
  • 20
  • 18
  • Cheers @Casey this is very interesting, but the XML -> PDF portion (which, if I understand correctly, is what FOP could provide) is beyond the "no software install" limitation that I have for this project. Still, +1, a very clean approach – Pekka Dec 30 '10 at 21:28
  • PDF output via FO works well enough for some uses cases, but since FOP's renderer is different to Word's, line and page breaks may fall in different places. For this reason, a higher fidelity commercial alternative may be of interest: http://www.docx4java.org/blog/2015/02/high-fidelity-pdf-output/ – JasonPlutext Feb 14 '15 at 20:52
3

fpdf and there is another extention on top of it, which I can't remember, which allows you to import templates

renevdkooi
  • 1,515
  • 1
  • 17
  • 42
  • [FPDI](http://www.setasign.de/products/pdf-php-solutions/fpdi/) is what you are referring to I believe. – Orbling Jan 27 '11 at 00:29
3

Your best bet would be to generate the entire document on the fly, with the template defined programatically using fpdf or something similar. That way, your text will not be cut off by paragraphs or anything like that, and you can easily position images/other elements as required.

John Cartwright
  • 5,109
  • 22
  • 25
2

Late, but you can use OpenSource template designer https://github.com/applicius/dhek/releases , to define pkaceholders/areas over any existing PDF, then load it in PHP (as it's JSON format) and write accordingly on original PDF using fpdf lib, to generate custom PDF with dynamic data written on.

cchantep
  • 9,118
  • 3
  • 30
  • 41
1

I'll add this new answer since the FDF PHP extension is now dead.

I've just followed these instructions and ended up executing one perl script then the pdftk command

I'm pretty aware it's far from being a real PHP solution but it's reliable and fairly easy to implement on any *nix platform.

The tools described there are also available on Debian, just in case you were wondering.

Capsule
  • 6,118
  • 1
  • 20
  • 27
1

Altough not exactly thing you asked, you may consider to make it at two steps: using some php templating sytem (smarty, dwoo) to generate html page and then using tools like Html2Pdf convert it to pdf. I am using it, and results are good (no problems with page layout etc)

Of course it depends of your input documents (can you use html instead of PDF/ ODT as source ) and complexity of the layout of those.

ts.
  • 10,510
  • 7
  • 47
  • 73
  • Cheers, this is a nice idea but won't work in my case, because I have office documents already containing the placeholders that need to be filled in (documents which the client must be able to modify autonomously). – Pekka Dec 30 '10 at 21:29
1

Ok I'm trying to help you solve the problem a little.

First the answer for couple of your question.

Q - Am I overlooking something?

A - No. There is a PHP PDF library that can replace placeholder variables in an existing PDF and generate a PDF file as the end result, without screwing up the layout

Q - Is there a way to do this using PDF forms?

A - Yes. absolutelly the tric to doing this is by using a PDF Forms

For both answer you can use Justin Koivisto fill pdf form field php library. For more detail you please go to http://koivi.com/fill-pdf-form-fields/tutorial.php. Take a look there for additional information.

Credit to Justin Koivisto for his work

P.S

For workaround for displaying a table like output from pdf form please consider to take some reading on Oracle Business Intelligence Publisher User's Guide - Creating a PDF Template

Gajahlemu
  • 1,253
  • 7
  • 17
0

It's a litte bit late but have a look at the PDFTemplate Library it does exatly what you want. You can create Open Document files (odt) and add placeholders in it. The PDFTemplate library can fill out these placeholders (even with images) and create a PDF file.

ODT Files with placeholders to PDF