2

I am trying to create PDF editing prototype using PdfTron software.

I have successfully created interface where user can click on image, created from PDF, select region and will be presented a text input where he/she can then enter text, that will replace the content in PDF file.

Now the text replacing part is problematic. Since there is no API doc for Python (only examples) I am following Java / Android API documentation.

Where I am for now. I have following code to find out the elements that are in user selected rectangle. Values x1, y1, x2, y2 are PDF coordinates based on user selection in the front end.

rect = Rect(x1, y1, x2, y2)
text = ''
extractor = TextExtractor()
extractor.Begin(page)
line = extractor.GetFirstLine()
words = []
while line.IsValid():
    word = line.GetFirstWord()
    while word.IsValid():
        elRect = word.GetBBox()
        elRect.Normalize()
        if elRect.IntersectRect(elRect, rect):
            text += ' ' + word.GetString()
            words.append(word)
        word = word.GetNextWord()
    line = line.GetNextLine()

words is basically array where I store the content that will later need to be replaced for new element.

Now the problem. I want the new element have the same style and font that the old text has. Api (link) tells me that using

style = words[0].GetStyle()

gives me style of the word and I can get font from style using

font = style.GetFont()

doc : https://www.pdftron.com/pdfnet/mobile/docs/Android/pdftron/PDF/TextExtractor.Style.html

But this returned font is of Obj class not Font class.

And apparently creating new text element with font requires object of Font class.

Because

element = eb.CreateTextBegin(font, 10.0);

generates an error:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/alan/.virtualenvs/pdfprint/local/lib/python2.7/site-packages/PDFNetPython2.py", line 5056, in CreateTextBegin
    def CreateTextBegin(self, *args): return _PDFNetPython2.ElementBuilder_CreateTextBegin(self, *args)
NotImplementedError: Wrong number or type of arguments for overloaded function 'ElementBuilder_CreateTextBegin'.
  Possible C/C++ prototypes are:
    pdftron::PDF::ElementBuilder::CreateTextBegin(pdftron::PDF::Font,double)
    pdftron::PDF::ElementBuilder::CreateTextBegin()

Perhaps there is better approach to achieving same result?

Edit1

Reading docs I found that you can create Font object based on Object like:

font = Font(style.GetFont())

Still stuck on creating element with those styles though.

/edit1

Edit2

I use following code to test writing into file:

style = elements[0].GetStyle()
font = Font(style.GetFont())
fontsize = style.GetFontSize()
eb = ElementBuilder()
element = eb.CreateTextBegin(font, 10.0)
writer.WriteElement(element)
element = eb.CreateTextRun('My Name')
element.SetTextMatrix(10, 0, 0, 10, 100, 100)
gstate = element.GetGState()
gstate.SetTextRenderMode(GState.e_fill_text)
gstate.SetStrokeColorSpace(ColorSpace.CreateDeviceRGB())
gstate.SetStrokeColor(ColorPt(1, 1, 1))
element.UpdateTextMetrics()
writer.WriteElement(element)
writer.WriteElement(eb.CreateTextEnd())
writer.End()
from core.helpers import ensure_dir
ensure_dir(output_filename)
doc.Save(output_filename, SDFDoc.e_linearized)
doc.Close()

What I cant figure out is:

  1. How to copy styles from existing element.
  2. How to position new element in document.
  3. Why this test code does not give me visible results. As far as I see new file gets created by it does not have "My Name" anywhere in it.

/Edit2

Odif Yltsaeb
  • 5,575
  • 12
  • 49
  • 80

1 Answers1

2

Based on the code above it looks like you want to append some text to an existing page based on the font style (font name + color) used by the first word on the page.

There are couple issue with the above code. You are setting the stroke color rather than fill:

gstate.SetTextRenderMode(GState.e_fill_text)
gstate.SetStrokeColorSpace(ColorSpace.CreateDeviceRGB());
gstate.SetStrokeColor(ColorPt(1, 1, 1))

try

gstate.SetTextRenderMode(GState.e_fill_text)
gstate.SetFillColorSpace(ColorSpace.CreateDeviceRGB());
gstate.SetFillColor(ColorPt(1, 0, 0))  // hardcode to red … for testing purposes only

The main issue is most likely related to Font handling. You are hijacking an existing font and are assuming that this font is using ‘standard encoding’. However this font is likely not using standard encoding. Also fonts in existing PDFs are often sub-setted (this means that the font does not contain a full list of glyphs, but only character references that are present in the document). As a result, you may see notdef or whitespace instead of the expected text. This and some other issues are covered here:

https://groups.google.com/d/msg/pdfnet-sdk/RBTuJG2uILk/pGkrKnqZ_YIJ https://groups.google.com/d/msg/pdfnet-sdk/2y8s5aehq-c/xyknr9W5r-cJ

As an solution, instead of using embedded font directly you can find a matching system font (e.g. based on font name and other properties) and create a new font. PDFNet offers a utility method Font.Create(doc, font) , or Font.Create(doc, "Font name").

This methods will create a Unicode font so you should use eb.CreateUnicodeTextRun() rather than eb.CreateTextRun().

Alternatively you could use AcroForm as a template (see InteractiveForms sample) and pdfdoc.FattenAnnotations() to end-up with read-only version of the document.

Brad Larson
  • 170,088
  • 45
  • 397
  • 571
  • I cant use replacer because looking at replacer code the it seems to expect to find sfull strings in one text element. But in my test case, the pdf in some areas has All single letters in different text nodes/elements. In any case - the replacer code ran, but failed to produce any results, just like the code i posted. Now about the fonts - the fonts should be fully be present in file. But i'll test what happens if i load my own font file. – Odif Yltsaeb Apr 01 '14 at 21:37