0

I have PDF form with keys in placeholders, and I need to replace them with actual data. As I understand, Apache PDFBox do it. Please, tell me, is this case able with Apache PDFBox? Have you examples how to replace text with this library?

I try to do it this way:

public void replaceText() throws IOException {
    PDDocument load = PDDocument.load(new File("test_3.pdf"));
    List<Object> tokens = null;
    for (PDPage pdPage : load.getPages()) {
        PDFStreamParser parser = new PDFStreamParser(pdPage);
        parser.parse();
        tokens = parser.getTokens();

        for (int i = 0; i < tokens.size(); i++) {
            Object next = tokens.get(i);
            if (next instanceof Operator) {
                Operator o = (Operator) next;
                if ("Tj".equals(o.getName())) {
                    COSString previous = (COSString) tokens.get(i - 1);
                    String string = previous.getString();
                    if ("goal".equals(string)) {
                        System.out.println(string);
                    }
                    string = string.replaceFirst("goal", "GOAL==");
                    previous.setValue(string.getBytes());
                }
            }
        }
    }
    PDStream updatedStream = new PDStream(load);
    OutputStream out = updatedStream.createOutputStream(COSName.FLATE_DECODE);
    ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
    tokenWriter.writeTokens(tokens);
    out.close();
    load.save("output.pdf");
    load.close();
}

I want to replace keyword 'goal' with value 'GOAL=='.

Olaf Kock
  • 46,930
  • 8
  • 59
  • 90
  • What have you tried so far? Please update your question and add a code snippet, if available. – MWiesner Mar 10 '20 at 13:38
  • 1
    Text can usually not be replaced that way, see https://pdfbox.apache.org/2.0/migration.html#why-was-the-replacetext-example-removed . If your PDF has an AcroForm, then you should be able to set the values if you know the field names by using the field API. To find the field names, open the file with PDFDebugger and hover over the fields. – Tilman Hausherr Mar 10 '20 at 17:10
  • Essentially: If you have a PDF generated in a special way (in particular using fonts with **WinAnsiEncoding** or a similar encoding which are not subset-embedded, and drawing a whole line of text at once, in particular not applying kerning), it can be edited using your code with some minor corrections. If your PDF can be arbitrary, actual automatic editing is extremely difficult or even impossible. There are some in-between stages with PDFs subject to other restrictions in which case editing may be feasible. In general, though, use AcroForm form fields instead and fill and flatten them. – mkl Mar 11 '20 at 09:52

0 Answers0