3

I have a source pdf which I am modifying by adding text objects. I am using "Incremental Updates" which is mentioned in the PDF specification. But while adding text objects using this method I am making some mistakes due to which the pdf doesn't render properly in Adobe Reader 11. When the pdf is opened and I double-click on it, the added text objects get deleted. I figured out that this is due to text annotation.

Now I want to know how a new text object can be added using incremental update? How do the Contents and RC of a free text annotation have to be to maintained?

Also is it possible to disable or delete the annotation so that my problem can be avoided easily? Because I want a simple pdf, I don't want annotation options.

The source pdf I am using is here.

The modified pdf after adding text object is here.

I am not sure that source pdf is itself proper according to pdf specification.

Jason Sundram
  • 12,225
  • 19
  • 71
  • 86
IT researcher
  • 3,274
  • 17
  • 79
  • 143
  • You try to add text to the PDF using free text annotations. This can be quite troublesome as discussed in a former question. Therefore, **is it a requirement that you add the text as such an annotation?** Or **would it also be ok to add the text as regular PDF page content?** – mkl Mar 01 '13 at 10:22
  • Thank u for your reply. I want to add the text as a regular pdf page. I don't want any annotations. How can i do it? – IT researcher Mar 01 '13 at 10:25
  • For which programming language do you need a solution? In case of Java or .Net I could point you towards samples using iText(Sharp) (my personally favoured PDF library) while others surely could show samples using other PDF libraries. – mkl Mar 01 '13 at 10:34
  • Thanks. we are using vb6. – IT researcher Mar 01 '13 at 11:17
  • Also we want to specify font and font-size etc – IT researcher Mar 01 '13 at 11:29
  • Hhmmm, I don't know which PDF libraries can still be used in a VB6 environment. Specifying font and font size for additions shouldn't be a problem for any decent generic PDF library. – mkl Mar 01 '13 at 13:16
  • can you tell me how can i change the pdf manually according pdf specification so that the problem can be solved? – IT researcher Mar 01 '13 at 14:48
  • Is your source PDF generic or from a limited selection allowing you to collect done information beforehand? in the later case you might get away with some "manual procedure" – mkl Mar 01 '13 at 20:34
  • @mkl I have tried some characters to pdf in two different fonts.The source pdf i used is [link]http://incometaxsoft.com/pdf/source.pdf and modified pdf is [link]http://incometaxsoft.com/pdf/modified1.pdf – IT researcher Mar 02 '13 at 08:13
  • @mkl plz read my previous comment. Now the problem is that in the in the modified pdf i added text abc..z and ABC...Z just for testing.But letters b j k q v etc not appearing in the pdf.Also x and y axis position for the text is not properly displaying in pdf.can u please tell me what has caused this problem . – IT researcher Mar 02 '13 at 08:25
  • I'm not in office during the weekend and, therefore, don't have all my tools at hand. based on your description, though, i would surmise that you you a font which it's only partially embedded (a standard technique to limit the size of the created pdfs). If you need other characters than the actually embedded ones, you have to add your own font information, too. And here you definitively are at a point where you don't want to code stuff "manually". – mkl Mar 02 '13 at 09:26
  • Can you, in your project, access and use .Net libraries? In that case i could reference an example in c# which might show you a way. – mkl Mar 02 '13 at 09:58

1 Answers1

9

First off let me show you how easy things are if you can use a decent PDF library. I use iTextSharp as an example but the same can also be done with others like PDFBox or PDFNet (already mentioned by @Ika in his answer):

PdfReader reader = new PdfReader(sourcePdf);
using (PdfStamper stamper = new PdfStamper(reader, targetPdfStream)) {
  Font FONT = new Font(Font.FontFamily.HELVETICA, 12, Font.BOLD, new GrayColor(0.75f));
  PdfContentByte canvas = stamper.GetOverContent(1);
  ColumnText.ShowTextAligned(
    canvas,
    Element.ALIGN_LEFT, 
    new Phrase("Hello people!", FONT), 
    36, 540, 0
  );
}

(Derived from the Webified iTextSharp Example StampText.cs explained in chapter 6 of iText in Action — 2nd Edition.)

(Which PDF library you choose, depends on your general requirements and available license models.)

If, in spite of the ease of use of such PDF libraries, you insist on doing it manually, here some remarks:

First you have to find the Page dictionary of the page you want to add content to. Depending on the type of PDF this already might require decompression of object streams etc. but in your sample modified1.pdf that is not necessary:

7 0 obj
  <</Rotate 90
    /Type /Page
    /TrimBox [ 9.54 6.12 585.68 835.88 ]
    /Resources 8 0 R
    /CropBox [ 0 0 595.22 842 ]
    /ArtBox [ 9.54 18.36 585.68 842 ]
    /Contents [ 9 0 R 10 0 R 11 0 R 12 0 R 13 0 R 14 0 R 15 0 R 16 0 R ]
    /Parent 6 0 R
    /MediaBox [ 0 0 595.22 842 ]
    /Annots 17 0 R
    /BleedBox [ 9.54 6.12 585.68 835.88 ]
  >>
endobj 

You see the array of references to content streams. This is where you have to add new page content to. You can manipulate an existing stream or create a new stream and add it to that array.

(Most PDFs have their content stream compressed. For the general case, therefore, you'd have to decompress a stream before you can work on it. Thus, in my eyes, the easier way would be to start a new stream.)

You chose to manipulate the last referenced stream 16 0 which in your PDF is uncompressed:

16 0 obj
<</Length 37 0 R>>
stream
  S 1 0 0 1 13.183 0 cm 0 0 m
  [...]
  0 10 -10 -0 506.238 342.629 Tm
  .13333 .11765 .12157 scn
  -.0002 Tc
  .0006 Tw
  (the Bank and branch on which cheque is drawn\).)Tj

  /F1 2 Tf
  -15.1279 10.9462 Td
  (abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~!@#$%^&*aaaaaaaaaaaaa)Tj

  /F2 1 Tf
  015.1279 01.9462 Td
  (ANAabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789)Tj

  ET
endstream
endobj 

Your additions, I gather, are the two 3-liners at the bottom which first select a font, then position the insertion point and finally print a selection of letters.

Now you say you added text abc..z and ABC...Z just for testing. But letters b j k q v etc not appearing in the pdf. The problem becomes even more visible for your second addition of letters; here only the capital 'A' and 'N' are displayed.

The added letter groups

This is due to the fact that the fonts in question are embedded into the PDF --- fonts are embedded into PDFs to allow PDF viewers on systems which don't have the font in question, to display the PDF --- but they are not completely embedded, only the subset of characters required from that font.

Let's look for the font F2 for which only 'N' and 'A' appear:

According to the page object, the page resources can be found in object 8 0:

8 0 obj
  <</Font <</F1 45 0 R /TT2 46 0 R /F2 47 0 R>>
    /ExtGState <</GS2 48 0 R>>
    /ProcSet [ /PDF /Text ]
    /ColorSpace <</Cs6 49 0 R>>
  >>
endobj 

So F2 is defined in 47 0:

47 0 obj
  <</Subtype /Type1
    /Type /Font
    /Widths [ 722 250 250 250 250 250 250 250 250 250 250 250 250 722 ]
    /Encoding 52 0 R
    /FirstChar 65
    /FontDescriptor 53 0 R
    /ToUnicode 54 0 R
    /BaseFont /ILBPOB+TimesNewRomanPSMT-Bold
    /LastChar 78
  >>
endobj 

In the referenced ToUnicode map 54 0 you see

54 0 obj
<</Length 55 0 R>>stream
  /CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo <<
  /Registry (AAAAAA+F2+0) /Ordering (T1UV) /Supplement 0 >> def
  /CMapName /AAAAAA+F2+0 def
  /CMapType 2 def
  1 begincodespacerange <41> <4e> endcodespacerange
  2 beginbfchar
  <41> <0041>
  <4e> <004E>
  endbfchar
  endcmap CMapName currentdict /CMap defineresource pop end end
endstream
endobj 

In this mapping you see that only character codes 0x41 'A' and 0x4e 'N' are mapped

In your document the font is used only to print "NA" in the amount table cells and for nothing else. Thus, only those two letters 'N' and 'A' are embedded, which results in your addition with that font only outputting these letters.

Thus, to successfully add text to the page you either have to check the font ressources associated with the page for the glyphs they provide (and restrict your additions to those glyphs) or you have to add your own font resource.

As the presence of characters in the encoding often is not as easy to see as it is here (ToUnicode is optional), I would propose, you add your own font ressources. The PDF specification ISO 32000-1 explains how to do that.

Furthermore you state the x and y axis position for the text is not properly displaying in pdf. While you don't say what exactly you mean, you should be aware that in the content stream you can apply affine transformations to the coordinate system of the page, i.e. stretch, skew, rotate, and move the axis.

If you want to use the original coordinate system and not depend on the coordinates to be proper at your additions, you should add an initial content stream to the page containing a q operator (to save the current graphics state on the graphics state stack) and start your additions in a new final content stream with a Q operator (to restore the graphics state by removing the most recently saved state from the stack and making it the current state).

EDIT As a sample I applied the Java equivalent of the C# code at the top to your modified1.pdf with append mode activated. The following objects were changed or added as a result:

The page object 7 0 has been updated:

7 0 obj
  <</CropBox[0 0 595.22 842]
    /Parent 6 0 R
    /Contents[69 0 R 9 0 R 10 0 R 11 0 R 12 0 R 13 0 R 14 0 R 15 0 R 16 0 R 70 0 R]
    /Type/Page
    /Resources<<
      /ExtGState<</GS2 48 0 R>>
      /ProcSet [/PDF /Text /ImageB /ImageC /ImageI]
      /ColorSpace<</Cs6 49 0 R>>
      /Font<</F1 45 0 R/F2 47 0 R/TT2 46 0 R/Xi0 68 0 R>>
    >>
    /MediaBox[0 0 595.22 842]
    /TrimBox[9.54 6.12 585.68 835.88]
    /BleedBox[9.54 6.12 585.68 835.88]
    /Annots 17 0 R
    /ArtBox[9.54 18.36 585.68 842]
    /Rotate 90
  >>
endobj 

If you compare with your former version, you see that

  • two new content streams have been added, 69 0 at the start and 70 0 at the end;
  • the resources are not an indirect object anymore but instead are directly included here;
  • the resources contain a new Font ressource Xi0 at 68 0.

Now let's look at the added objects.

This is the font ressource for Helvetica-Bold named Xi0 at 68 0:

68 0 obj
  <</BaseFont/Helvetica-Bold
    /Type/Font
    /Encoding/WinAnsiEncoding
    /Subtype/Type1
  >>
endobj 

Non-embedded, standard 14 font resources are not complicated at all...

Now there are the additional content streams. iText does compress them, but I'll show them in an uncompressed state here:

69 0 obj
<</Length 1>>stream
  q
endstream
endobj
70 0 obj
<</Length 106>>stream 
  Q
  q
  0 1 -1 0 595.22 0 cm
  q
  BT
  1 0 0 1 36 540 Tm
  /Xi0 12 Tf
  0.75 g
  (Hello people!)Tj
  0 g
  ET
  Q
  Q
endstream
endobj 

So the new content stream at the start stores the current graphic state, and the new one at the end retrieves that stored state, changes the coordinate system, positions for text insertion, selects font, font size, and the fill colour, and finally prints a string.

mkl
  • 90,588
  • 15
  • 125
  • 265
  • thank you for your reply.How can i add new font to the pdf.In my case i want to add helvetica and wingdings font. Does the code given by u alone will work? I am not well aware of C#. Is there any method for adding new font to a pdf? – IT researcher Mar 04 '13 at 11:37
  • plz check my last comment. Also i want to add these fonts with full set not subset of font. – IT researcher Mar 04 '13 at 12:34
  • The code works in combination with the iTextSharp library. I don't know how to call .Net code from vb6 in a reliable manner but as you did not negate my question regarding that, I assume you have an idea. Otherwise "manually" adding a font means adding the required font ressources to the PDF. Helvetica and Zapf Dingbats are among the standard 14 fonts; thus, there is no need to embed them. – mkl Mar 04 '13 at 12:40
  • I just added some information on which objects (both font and content) are added are added when running the c# code at the top of the answer. Now, especially with the specification at hand, all should be clear. – mkl Mar 04 '13 at 13:49
  • Have u added that objects using pdfnet ? Is that paid or free? – IT researcher Mar 04 '13 at 14:35
  • I wrote *I use iTextSharp as an example* at the start. Its Java equivalent is iText, not PDFNet. – mkl Mar 04 '13 at 14:41
  • thank you. I was able to add text to pdf using itext sharp.But i have got another small problem.Please see my question here http://stackoverflow.com/questions/15218126/itextsharp-include-all-pages-from-the-input-file – IT researcher Mar 05 '13 at 07:32