1

I am using below code to find last line y co-ordinate in PDF using iTextSharp

Dim pdfReader As New PdfReader("D:/sample.pdf")
Dim y_cordinate as integer
Dim parser As New PdfReaderContentParser(pdfReader)
Dim finder As TextMarginFinder
finder = parser.ProcessContent(pagen0, New TextMarginFinder())
y_cordinate = finder.GetLly()

If it is direct PDF I am able to get correct y co-ordinate but in other case where I am converting a MS Word Document to PDF and try above code on that PDF then it is getting the margin of Word Document location in PDF (y-ordinate) please help to find the correct y coordinate where the text ends in PDF.(Word earlier). please find the link with PDF which is converted from Word. https://www.dropbox.com/s/ha1vrk58umuv3h7/PACACH0123.pdf?dl=0

Satish
  • 21
  • 3

1 Answers1

0

Each page of your sample document contains text drawing instructions which draw space characters with a base line at y coordinate 39:

BT
/F2 14.04 Tf
1 0 0 1 72.024 39.024 Tm
[( )] TJ
ET
BT
1 0 0 1 306.05 39.024 Tm
[( )] TJ
ET
BT
1 0 0 1 397.63 39.024 Tm
[( )] TJ
ET

and none below that

Thus, your code will correctly return 39 + descent as bottom of the last line.


To get around this problem, you can employ the method explained and outlined in Java/iText in this answer to "TextMarginFinder to verify printability", i.e. by ignoring all space characters while calculating the text bounding box:

using (PdfReader pdfReader = new PdfReader(source))
{
    System.Console.Write("\n*\n*\n* Filtered last lines per page of {0}\n*\n*\n", source);
    for (int page = 1; page <= pdfReader.NumberOfPages; page++)
    {
        PdfReaderContentParser parser = new PdfReaderContentParser(pdfReader);
        TextMarginFinder finder = new TextMarginFinder();
        FilteredRenderListener filtered = new FilteredRenderListener(finder, new SpaceFilter());
        parser.ProcessContent(page, new TextRenderInfoSplitter(filtered));
        System.Console.Write("Page {0}, Bottom y {1}\n", page, finder.GetLly());
    }
}

with these two helper classes

class TextRenderInfoSplitter : IRenderListener
{
    public TextRenderInfoSplitter(IRenderListener strategy) {
        this.strategy = strategy;
    }

    public void RenderText(TextRenderInfo renderInfo) {
        foreach (TextRenderInfo info in renderInfo.GetCharacterRenderInfos()) {
            strategy.RenderText(info);
        }
    }

    public void BeginTextBlock() {
        strategy.BeginTextBlock();
    }

    public void EndTextBlock() {
        strategy.EndTextBlock();
    }

    public void RenderImage(ImageRenderInfo renderInfo) {
        strategy.RenderImage(renderInfo);
    }

    IRenderListener strategy;
}

class SpaceFilter : RenderFilter
{
    public override bool AllowText(TextRenderInfo renderInfo)
    {
        return renderInfo != null && renderInfo.GetText().Trim().Length > 0;
    }
}

The output for your sample document is:

*
*
* Filtered last lines per page of PACACH0123.pdf
*
*
Page 1, Bottom y 81,92254
Page 2, Bottom y 413,1685
Page 3, Bottom y 688,4785

This looks more like the numbers you are after.

Community
  • 1
  • 1
mkl
  • 90,588
  • 15
  • 125
  • 265
  • Thank you very much,the give co-ordinates are correct , please share the full source code for itextsharp. please share. – Satish Jun 07 '16 at 03:16
  • Its working perfect for first 2 pages, in 3rd page there is a image , i thinks its ignoring image also.please advise. – Satish Jun 07 '16 at 03:37
  • *please share the full source code for itextsharp.* - what are you missing?... *i thinks its ignoring image also* - yes, after all you are using a `TextMarginFinder`... – mkl Jun 07 '16 at 04:10
  • You can easily write a more generic `MarginFinder` which also takes bitmaps and vector graphics into account, cf. [MarginFinder.java](https://github.com/mkl-public/testarea-itext5/blob/master/src/main/java/mkl/testarea/itext5/content/MarginFinder.java) from [this answer](http://stackoverflow.com/a/20212172/1729265). It is in Java but should be easy to translate to c#. – mkl Jun 07 '16 at 04:20
  • @Satish Were you able to translate the iText/Java [MarginFinder](https://github.com/mkl-public/testarea-itext5/blob/master/src/main/java/mkl/testarea/itext5/content/MarginFinder.java) to .Net/iTextSharp? – mkl Jun 10 '16 at 09:14