1

I am setting a margin for a pdf and checking if the contents of the page are exceeding the margin.

I am easily able to do that if the contents of a page are just text.

Here s what I am doing:

I am using TextMarginFinder. I will set the left margin values of the pdf based on the book size. and check with the finder.getLlx(); since finder.getLlx(); will get me the left most position of a text in that page.

TextMarginFinder finder;
if(leftmar>=finder.getLlx())
   {
        errormargin=1; //left margin error
        System.out.println("Page: "+i+"Margin Error:LeftMArginError ");
   }

But this does not work in case if the page contains an image. Although the image goes outside of the margin, I am not getting the error with the above code since the finder.getLlx(); function seems to work only for texts.

Two Questions:

1) While looping through the pages in pdf, if there is an image in that page, how can I check if that particular page contains an image?

2) If it contains an image, how can I obtain its extreme positions?

Update after mkl suggestion

     if(leftmar>=finder.getLlx())
{
    errormargin=1; //left margin error
    System.out.println("finder.getLlx() value ="+finder.getLlx()+", leftmar Value="+leftmar);

}



     if(rightmar<= finder.getUrx()){
            errormargin=1; //right margin error
            System.out.println("finder.getUrx() value ="+finder.getUrx()+", rightmar Value="+rightmar);
     }


if(margintop >= finder.getUry()){
    errormargin=3; //top margin error
    System.out.println("finder.getUry() value ="+finder.getUry()+", margintop Value="+margintop);
}


if(marginbottom >= finder.getLly()){
    errormargin=3; //bottom margin error
    System.out.println("finder.getLly() value ="+finder.getLly()+", marginbottom Value="+marginbottom);
}
Abhinav
  • 8,028
  • 12
  • 48
  • 89
  • So essentially you want an image to be constrained by the margin (i.e. not expand over it) or do you want a flag to be raised when one does pass over (so that you may do an action)? – insidesin Sep 04 '15 at 09:24
  • Exactly thats my constraint. – Abhinav Sep 04 '15 at 09:25
  • *I am using `TextMarginFinder`* - consider using [MarginFinder](https://github.com/mkl-public/testarea-itext5/blob/master/src/main/java/mkl/testarea/itext5/content/MarginFinder.java) instead, a class used in [this answer](http://stackoverflow.com/a/20212172/1729265). This class also considers bitmap images and vector graphics; for the latter ones it is in a proof-of-concept state, though. – mkl Sep 04 '15 at 09:45
  • Infact I am using the same class, for finding the position of texts which I am able to, but not of images. Can you provide some code snippet or something related to what you are saying? – Abhinav Sep 04 '15 at 09:48
  • *Infact I am using the same class, for finding the position of texts* - Are you sure you already use the class I linked to? In that case, can you share a sample PDF in which it does not return the position of arbitrary content, including images? *Can you provide some code snippet* - confer the answer I linked to. – mkl Sep 04 '15 at 10:05
  • oh wait,I am sorry, Not that class, lemme check and get back, – Abhinav Sep 04 '15 at 10:11
  • Hi @mkl , thanx for pointing me to that class, can you explain a bit on how to use that class for my case? I m sorry if that sounds stupid coz I am more of a PHP Developer – Abhinav Sep 04 '15 at 10:27
  • You said you already use the `TextMarginFinder` (I assume you mean the one from the iText jar). You can simply use the `MarginFinder` class as a replacement for it. If it does not work for you, please show your code (working for `TextMarginFinder`), then we should be able to tweak it for your use case. – mkl Sep 04 '15 at 10:34
  • @mkl : Hi, thank you so much for your suggestion, I did acordingly, although I feel that I have been doing something wrong from the beginning. I have updated my Code,please check it. The `getUrx()` method is getting me the entire width I suppose which shouldnt be the case. – Abhinav Sep 04 '15 at 11:08
  • `getUrx` should give you the x coordinate of the right most content part. If in your case it goes as far as the right end-of-page, then there likely is some content going that far. E.g. some software first draws a white rectangle covering the whole page. Whether that is the case or something more subtle, I cannot tell without the sample PDF. – mkl Sep 04 '15 at 12:49
  • Hi @mkl I got the solution. What `getUrx` gives is the x co-ordinate from right to left, I just had to do a little subtraction from my pdf `width` with the `getUrx()` and compare it with my margin value.. Thank you so much for your class :) – Abhinav Sep 05 '15 at 09:28
  • May be you should post that as an answer, I will accept it :) – Abhinav Sep 05 '15 at 09:29

1 Answers1

1

This is more an answer to what the OP actually wanted, a way to retrieve the bounding box of all content on a page.

The OP already uses the iText TextMarginFinder render listener class to determine the bounding box of the text on page. In the context of this answer an analogous class MarginFinder has been developed which does not only consider text but also other kind of content, e.g. bitmap images and vector graphics.

Thus, replacing the use of TextMarginFinder by MarginFinder allows to find the bounding box of any content on the page.

Please be aware:

  • Any content is considered, the margin finder does not check whether the content makes a difference. E.g. think about white text, white bitmap areas, or white rectangles, all are considered content and, therefore, the bounding box encompasses such invisible content, too. Especially the latter example, white rectangles, might be a problem here or there as some software first paints a white rectangle over the whole page area.

  • Clipping paths are not considered. Thus, even content that never is drawn (because it is clipped away) makes the bounding box expand.

  • Page borders are not considered, either. Thus, off-page content like printer marks may make the bounding box expand even more.

  • The code calculating the bounding box for vector graphics is not correct: it simply returns the bounding box of all control points which in case of Bezier curves may be false. Its ignoring line widths and wedge types also results in somewhat-off coordinates.

  • Annotations are not considered. Thus, the resulting bounding box may be to small if annotations are expected to also be considered, e.g. for forms.

In spite of these shortcomings, the render listener usually returns correct results. If this is not enough, the class can be extended accordingly.

PS: Anyone who is interested in the original question may find answers in the MarginFinder render listener class and its use.

Community
  • 1
  • 1
mkl
  • 90,588
  • 15
  • 125
  • 265
  • My code was working but suddenly stopped working and I am getting the below error and I dont know why :( , can you pls help me out on this `Error: Exception in thread "main" java.lang.NullPointerException at mkl.testarea.itext5.content.MarginFinder.getLlx(MarginFinder.java:53) at itext.MarginFinderFinal.addFullMarginRectangle(MarginFinderFinal.java:184) at itext.MarginFinderFinal.main(MarginFinderFinal.java:246) `, MarginFinderFinal is my class where I have written the business logic – Abhinav Sep 22 '15 at 06:39
  • In this function of your class, when I tried this, I got to know that textRectangle is null, the `if condition` was getting evaluated `public float getLlx() { if (textRectangle == null){ System.out.println("It is null"); } } ` , why would that be null?? – Abhinav Sep 22 '15 at 07:00
  • 1
    You are right, that `textRectangle` is `null` until the first bit of content with location and extent flows in. Thus, e.g. an essentially empty page may leave it `null`. If the page in question is not empty, could you share it for analysis? Because I can think of no actual content *completely* missed by the `MarginFinder`. – mkl Sep 22 '15 at 07:50
  • Oh man thank you so much....you are right, I got that error in the second page itself and guess what thats a blank page, I was pulling my head since yesterday, thank you so much, I have included necessary condition when It encounters `null` :) – Abhinav Sep 22 '15 at 09:28