1

I am using openPdf library (fork of iTextPdf) to replace placeholders like #{Address_name_1} with real values. My PDF file is not simple, so I use regular expression to find this placeholder:[{].*?[A].*?[d].*?[d].*?[r].*?[e].*?[s].*?[s].*?[L].*?[i].*?[n].*?[e].*?[1].*?[}]

and do something like

content = MY_REGEXP.replace(content, "Saint-P, Nevskiy pr."); obj.setData(content.toByteArray(CHARSET)).

The problem happens when my replacement line is too long and it is unfortunately cut from right end. Can I somehow make it carry over to the next line? Naive \n does not work.

mkl
  • 90,588
  • 15
  • 125
  • 265
Bogdan Timofeev
  • 1,062
  • 3
  • 11
  • 33
  • 1
    The existing answers already tell you that you need to split your replacements beforehand and provide for enough placeholders for your use cases. Additionally please be aware that manipulating content streams like you do will only work with very special documents (matching encodings, subset font containing all required glyphs, intuitive text drawing, ...). That solution is not future-proof. – mkl Jan 20 '20 at 17:51
  • @mkl the sollution is not the best (it's very limited) but I think it will no problems with its future as long as templates are generated on a controlled enviroment (by the developer)... other case is that-one in which templates are generated by users without any special caution. In that case problems will arise. – user1039663 Jan 20 '20 at 18:07
  • @mkl Is there a better way to replace some contents inside of .pdf file? This solution I took from https://itextpdf.com/en/resources/examples/itext-7/replacing-pdf-objects – Bogdan Timofeev Jan 24 '20 at 10:26
  • 1
    *"This solution I took from ..."* - I hope you also read the [stack overflow answer](https://stackoverflow.com/a/21622539/1729265) for which that example had been written (linked in the example code comments), in particular that it is only for *relatively simple PDFs* but that *in real life, PDFs are never that simple and the complexity of your project will increase dramatically with every special feature that is used in your documents.* – mkl Jan 24 '20 at 10:57
  • 1
    *"Is there a better way to replace some contents inside of .pdf file"* - how about simply using pdf AcroForm form fields? Form fields are there to be filled in, not only manually but also programmatically, and after fill-in you can flatten the form. – mkl Jan 24 '20 at 11:04

2 Answers2

1

PDF files are NOT text files. Each line is an object with an x/y offset. To place something on the next line would require a new object placed at new x/y coords. You would need an advanced PDF editing toolkit.

Peter Quiring
  • 1,648
  • 1
  • 16
  • 21
1

PDF store strings in a different way. There are no next lines, there are lines.

So you will need to add several placeholders on fields on your template for replacements that can get long enough, like:

#{Address_name_1_line1}
#{Address_name_1_line2}
#{Address_name_1_line3}

And place it in different lines on your template. The non-used empty placeholders (because replacement is not long enough) should be replaced by empty strings.

For longer replacements you will need to use several placeholders. The number of placeholders to use and the replacement splitting should be determined by code.

If your PDF is too complex to place different placeholders then you will need to placeholder everything, all your text contents should be inyected into placeholders, at least if you want to use this approach.

mkl
  • 90,588
  • 15
  • 125
  • 265
user1039663
  • 1,230
  • 1
  • 9
  • 15