3

I want to remove the bottom part of each page in the PDF, but not change page size, what is the recommended way to do this in java in PDFBOX? How to remove the footer from each page in PDF?

Is there possibly a way to use PDRectangle to just delete all text/images within it?

snippet of what I tried, using rectangle with setCropBox seems to lose page size, maybe cropBox is not intended for this?

            PDRectangle rectangle = new PDRectangle();
            rectangle.setUpperRightY(mypage.findCropBox().getUpperRightY());
            rectangle.setLowerLeftY(50);
            rectangle.setUpperRightX(mypage.findCropBox().getUpperRightX());
            rectangle.setLowerLeftX(mypage.findCropBox().getLowerLeftX());                  
            mypage.setCropBox(rectangle);
            croppedDoc.addPage(mypage);
            croppedDoc.save(filename);              
            croppedDoc.close();

Closest example in pdfbox cookbook examples I could find is on how to remove entire page, however this is not what I'm looking for, I'd like to just delete few elements from the page: http://pdfbox.apache.org/userguide/cookbook.html

Ed Staub
  • 15,480
  • 3
  • 61
  • 91
Ville M
  • 2,009
  • 7
  • 30
  • 45
  • 2
    Why -1? I don't see any comments, at least explain why this is not a valid question or point to other questions or sources? – Ville M Sep 12 '12 at 18:07
  • I agree that the person who downvoted this could've left a comment. However, one possible reason might have to do with your question itself - what you wrote does not show any research of your own. Perhaps you want to try something out or *if you already have*, let us know your roadblocks, and then perhaps people over here can help you out. It always helps to show [what have you tried](http://mattgemmell.com/2008/12/08/what-have-you-tried/) – Sujay Sep 12 '12 at 19:05
  • OK, here's a bit more background. I basically looked through the API looking for function,something like "clip", which I had used in the past(in pdflib), also looked through the cookbook examples and did not find an example that covers cropping a page. I can see there is something called cropbox, but was not sure how this is supposed be used exactly and my attempt at using it caused the page size to change. Since both pdflib and iText had clear "clipping/cropping" examples, I thought maybe somebody would have a similar example for pdfbox. – Ville M Sep 12 '12 at 20:03
  • 1
    I think it would be helpful, if you edit the question itself and all these information as part of your question itself. – Sujay Sep 12 '12 at 20:47
  • http://stackoverflow.com/questions/6831194/how-can-i-remove-all-images-drawings-from-a-pdf-file-and-leave-text-only-in-java – Alvin Pradeep Sep 20 '12 at 10:23
  • : cropbox,trimbox,ref:http://www.prepressure.com/pdf/basics/page_boxes. Try – Alvin Pradeep Sep 20 '12 at 10:55
  • thanks, i looked at that question, for some reason removing imamges code in that example would not work with the logos in my pdf's footer, anyways, I'm now thinking of using PDFStreamParser and removing tokens that match particular rules – Ville M Sep 20 '12 at 22:04

2 Answers2

3

I'm also a newbie, but take a look at this page, in particular, the description of TrimBox. If there's no TrimBox on the page, it defaults to CropBox, which would cause what you're seeing.

In general, don't expect the PDFBox docs to tell you much of anything about PDF itself - to use PDFBox well I think you need to go elsewhere - AFAIK, mostly just to the PDF specification. I haven't even skimmed it yet, though!

Ed Staub
  • 15,480
  • 3
  • 61
  • 91
  • Thanks for the tip on trimBox, I tried setting it, no luck yet, still getting the sama result, ie. after I set rectangle, set both trimBox and cropBox, save, document has cut the footer, but is now no longer letter size. If I then restore the size to letter(mypage.setTrimBox(PDPage.PAGE_SIZE_LETTER)), footer comes back. It seems like there should be a function to make new size permanenent, or the ability to remove things that are not inside cropBox/trimBox. Anyways, I'll keep playing with these boxes more, thanks – Ville M Sep 14 '12 at 21:26
  • @VilleM - Is MediaBox set? I'm wondering if, in the first save, it's defaulting to CropBox. Also, see section 14.11.2 (page 627) of [the spec](http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf) to brainstorm more ideas. – Ed Staub Sep 15 '12 at 00:19
  • see my comment on debugging output below, in the next response, seems like all those other dimensions are ignored even though I'm setting them, only cropBox seems to matter – Ville M Sep 17 '12 at 22:56
3

The CropBox is the way to go if you want to remove a portion of a page while keeping a rectangular region visible. If you want the page size to remain the same, you need the MediaBox to remain the same.

From the PDF Spec:

CropBox - rectangle (Optional; inheritable) A rectangle, expressed in default user space units, defining the visible region of default user space. When the page is displayed or printed, its contents are to be clipped (cropped) to this rectangle and then imposed on the output medium in some implementation-defined manner (see Section 10.10.1, “Page Boundaries”). Default value: the value of MediaBox.

MediaBox - rectangle (Required; inheritable) A rectangle (see Section 3.8.4, “Rectangles”), expressed in default user space units, defining the boundaries of the physical medium on which the page is intended to be displayed or printed (see Section 10.10.1, “Page Boundaries”).

A have seen (faulty) applications and libraries that force the CropBox and the MediaBox to be the same, double check that this is not what is happening on your case.

Also take into account that the coordinates origin (0,0) in PDF is the bottom-left corner, some libraries do the translation to top-left for you, some others not, you may also want to double check this on the library you are using.

yms
  • 10,361
  • 3
  • 38
  • 68
  • OK, thanks, I think you could be right, maybe PDFbox is not operating to the spec you describe above, I set cropbox to my cropped rectangle, I set mediabox to LETTER size, and yet, resulting PDF is cropped, but size is 11x7.64, ie, page size changed. mypage.setCropBox(rectangle); mypage.setMediaBox(PDPage.PAGE_SIZE_LETTER);I will next try to debug in more detail. – Ville M Sep 17 '12 at 21:55
  • Here's some debuggin output, I would think Adobe acrobat should show this in the LETTER size, but it does not, it almost looks like all the other dimensions are ignored except cropbox: old cropBox:612.0 old mediabox:612.0 old trimBox:612.0 old bleedBox:612.0 new cropBox:550.0 new mediabox:612.0 new trimBox:612.0 new bleedBox:612.0 – Ville M Sep 17 '12 at 22:54
  • Can you post a link to the old file and the new file? I do not have a Java development environment ready but I could get some ideas by looking at those files. – yms Sep 18 '12 at 13:06
  • thanks for the offer but sorry, can't share them due to the data in them, anyways, I'm now thinking of using PDFStreamParser and removing tokens that match particular rules instead of using cropBox since i can't get it to do what I'd like. – Ville M Sep 20 '12 at 22:05
  • @Ville M You do not need to send me your production files, any PDF file processed by your application that shows the problem will do. – yms Sep 21 '12 at 13:26