TL;DR
How do you create a PDF from a JSON object that contains a String written in HTML.
Example JSON:
{
dimensions: {
height: 297,
width: 210
},
boxes: [
{
dimensions: {
height: 10,
width: 190
},
position: {
x: 10,
y: 10
},
content: "<h1>Hello StackOverflow</h1>, I think you are <strong></strong>! I hope someone can answer this!"
}
]
}
Tech used in front-end: AngularJS 1.4.9, ui.tinymce, ment.io
Back-end: whatever works.
I want to be able to create templates for PDFs. The user writes some text in a textarea, uses some variable that will later be replaced with actual data, and when the user presses a button, a PDF should be returned with the finished product. This should be very generic. So it would be able to be used in pretty much anything.
So, minimal example: The user writes a little text in TinyMCE like
<h1>Hello #[COMMUNITY]</h1>, I think you are <strong>great</strong>! I hope someone can answer this!
This text contains two variables that the user gets with the help of the ment.io plugin. The actual variables is supplied from the controller. This text is written in an AngularJS version of TinyMCE which also has Ment.io on it which supplies a nice view of available variables.
When the user presses the Save
button, a JSON object like the following is created, which is the template.
{
dimensions: {
height: 297,
width: 210
},
boxes: [
{
dimensions: {
height: 10,
width: 190
},
position: {
x: 10,
y: 10
},
content: "user input"
}
]
}
I have a directive in Angular that can generate any number of boxes really, in any size (generic-ho!). This part works great. Simply send in how big you want the 'page' (in mm, so the example says A4-paper size) in the first dimensions
object as you see in the object. Then in the boxes you define how big they should be, and where on the 'paper' it should go. And then finally the content, which the user writes in a TinyMCE textarea.
Next step: The back-end replaces the variables with actual data. Then pass it on to the generator.
Then we come to the tricky part: The actual generator. This should accept, preferably, JSON. The reason for this is because any project should be able to use it. The front-end and the PDF-generator goes hand in hand. They don't care what's in the middle. This means that the generator can be written in pretty much anything. I'm a Java-developer though, so Java is preferable (hence the Java-tag).
Solutions I've found are:
PDFbox, but the problem with using that is the content that TinyMCE produces. TinyMCE outputs HTML or XML. PDFBox does not handle this, at all. Which means I have to write my own HTML or XML parser to try and figure out where the user wants bold-text, and where she wants italics, headings, other font, etc. etc. And I really don't want that. I've been burned on that before. It is on the other hand great for placing the text in the correct places. Even if it is the raw text.
I've read that iText does HTML. But the the AGPL-license pretty much kills it.
I've also looked at Flying Saucer that takes XHTML and creates a PDF. But it seems to rely on iText.
The solution I'm looking at now is a convoluted way to use Apache FOP. FOP takes an XSL-FO object to work on. So the trouble here is to actually dynamically create that XSL-FO object. I've also read that the XSL-FO standard has been dropped, so unsure how future-proof this approach will be. I've never worked with neither FOP nor XSLT. So the task seems daunting. What I'm currently looking at is taking in the output from TinyMCE, run that through something like JTidy to get XHTML. From the XHTML create a XSLT file (in some magical way). Create a XSL-FO object from the XHTML and XSLT. And the generate the PDF from the XSL-FO file. Please tell me there is an easier way.
I can't have been the first to want to do something like this. Yet searching for answers seems to yield very few actual results.
So my question is basically this: How do you create a PDF from a JSON-object like the above, which contains HTML, and get the resulting text to look like it does when you write it in TinyMCE? Have in mind that the object can contain an unlimited number of boxes.