Interpolating custom data onto a PDF

Question

I am building an Angular test preparation app (with Laravel 5.1 API). One of the requirements is to allow the user to print a certificate of achievement.

The client wants the person's name and credentials interpolated into the document (e.g., highlighted below). Here is a snapshot of the PDF template they sent:

The way I'm handling PDF viewing is simply by storing the file on S3 and giving them a link to that file.

Interpolating information into a PDF doc doesn't seem trivial and I haven't found much information on programmatically allowing this, but there are tools like DocHub, that allow you do edit while viewing the PDF.

I'm interested in learning:

is doing this programmatically trivial?
are there 3rd party tools I'm unaware of?
would I even be able to send this information along to the S3 link to interpolate in the first place?

Look at some answers here to get started: https://stackoverflow.com/questions/34049956/generate-pdf-from-html-using-pdfmake-in-angularjs — Angad Dubey, Aug 27 '17 at 00:57
PDF is a final format in a processing chain. It in particular is not designed for easy change of content which may require existing content to be reflowed. You might consider changing the template to contain AcroForm form fields and fill them but that would leave more or less large gaps depending on the inserted name. Alternatively you may consider remodelling the template into some other, non-final format, e.g. html, insert values in that format, and then transform to pdf. — mkl, Aug 27 '17 at 09:56

score 1 · Answer 1 · answered Aug 28 '17 at 10:05

If you're already using Amazon services, why not use an amazon lambda function, in combination with iText7 (java), to generate the pdf on demand?

That way, you are guaranteed that the pdf is correct, and has nice layout every time.

Generating the pdf can either be done by:

converting HTML,
programmatically creating your entire document,
filling and flattening an XFA form.

I think for your use-case, either option 1 or 2 are the most sustainable.

score 1 · Accepted Answer · answered Aug 28 '17 at 10:37

Using PDF as a format for editing is usually a bad choice. If you have a form with fixed fields, then it's easy. Create a PDF template with an interactive form. In this form, based on AcroForm technology, you'll define fields with fixed coordinates, and a fixed size. You can then add content to these fields.

One major disadvantage with this approach is the lack of flexibility. Did you notice that I used the word "fixed" three times in the previous paragraph? If text doesn't fit the predefined field, you're out of luck. If the field is overdimensioned, you'll end up with plenty of white space. This approach is great if you can predict what the data will be like. A typical use case is a ticket or a voucher. For instance: the empty form is a really nice page, with only a couple of fields where an automated system can put a name, a date, a time, and a seat number.

This isn't the best approach for the example you show in your screen shot. The position of every line of text, every word, every character is known in advance. If you want to replace a short word with a long word (or vice-versa), then all those positions (of each line, of the complete page, possibly of the complete document) need to be recalculated. That's madness. Only people with very poor design skills come up with such an idea.

A better idea, is to store the template as HTML. See for instance chapter 5 of iText's pdfHTML tutorial, where we have this snippet of HTML:

<html>
    <head>
        <title>Invitation to SXSW 2018</title>
    </head>
    <body>
        <u><b>Re: Invitation</b></u>
        <br>
        <p>Dear <name>SXSW visitor</name>,
        we hope you had a great SXSW film festival experience last year.
        And we would like to invite you to the next edition of SXSW Film
        that takes place from March 9 until March 17, 2018.</p>
        <p>Sincerely,<br>
        The SXSW crew<br>
        <date>August 4, 2017</date></p>
    </body>
</html>

Actually, it's not really HTML, because the <name> tag and the <date> tag don't exist in HTML. All HTML processors (browsers as well as pdfHTML) ignore those tags and treat their content as if the tag was a <span>:

It doesn't make much sense to have such tags in the context of pure HTML, but it does make a lot of sense in the case of pdfHTML. With pdfHTMLL, you can configure custom tags, and have a result that looks like the PDFs shown below:

Look at the document for "John Doe" and compare it with the document for "Bruno Lowagie". The name "John Doe" is much shorter than my name, hence more words fit on that first line. The text flows nicely (we could also have chosen to justify the text on both sides). This "flow" is impossible to achieve with your approach, because you will never get a PDF template to reflow nicely.

OK, I get it, you probably say, but what about the practical aspects? You talk about a Java / .Net library, but I am working with Laravel and Angular.js. First, let me tell you that I don't think you'll find any good PDF tools for Laravel or Angular.js, because of the nature of PDF and those development environments (in my opinion, those technologies don't play well together). Regardless of my opinion, this shouldn't be much of a problem for you because you work in an Amazon environment. AWS supports Java, and the Java code needed to get pdfHTML working is minimal. Most of the code samples I wrote for the pdfHTML tutorial are shorter than 15 lines. So why not try Java and pdfHTML?

Interpolating custom data onto a PDF

2 Answers2