Issue in Removing Header and Footer in PDF using iText PDF

Question

I am using itext-xtra-5.5.6 api to remove/cleanup the header and footer.

Here is the code

//removes header and footer based on the configuration
public static void cleanUpContent(String inPDFFile,String targetPDFFile,PDFConfig pdfConfig) throws Exception{
    PdfReader reader = new PdfReader(inPDFFile);
    OutputStream outputStream = new FileOutputStream(targetPDFFile);
    float upperY=pdfConfig.getPdfUpperY();
    float lowerY=pdfConfig.getPdfLowerY();
    boolean highLightColor=pdfConfig.isPdfHighLightClippedTextColor();
    PdfStamper stamper = new PdfStamper(reader, outputStream);
    List<PdfCleanUpLocation> cleanUpLocations = new ArrayList<PdfCleanUpLocation>();

    for (int i = 1; i <= reader.getNumberOfPages(); i++) {
        Rectangle pageRect = reader.getCropBox(i);  
        Rectangle headerRect= new Rectangle(pageRect);
        headerRect.setBottom(headerRect.getTop()-upperY);               
        Rectangle footerRect= new Rectangle(pageRect);
        footerRect.setTop(footerRect.getBottom()+lowerY);   

        if(highLightColor){
            cleanUpLocations.add(new PdfCleanUpLocation(i, headerRect,BaseColor.GREEN));
            cleanUpLocations.add(new PdfCleanUpLocation(i, footerRect,BaseColor.GREEN));
        }else{
            cleanUpLocations.add(new PdfCleanUpLocation(i, headerRect));
            cleanUpLocations.add(new PdfCleanUpLocation(i, footerRect));
        }
    }   
    PdfCleanUpProcessor cleaner = new PdfCleanUpProcessor(cleanUpLocations, stamper);
    try{
        cleaner.cleanUp();
    }catch(Exception e){
         e.printStackTrace();
    }

    stamper.close();
    reader.close();
    outputStream.flush();
    outputStream.close();
}

When I run the code to remove header and footer for a PDF file with 1440 pages with upperY=65 and lowerY=65 then the code is deleting all the content from the page but when upperY=65 and lowerY=45 then code is deleting just the header and footer which is expected.

Also another issue is Null pointer exception for some pages in the DefaultClipper class

private void fixupFirstLefts2( OutRec OldOutRec, OutRec NewOutRec ) {
    for (final OutRec outRec : polyOuts) {
        if (outRec.firstLeft.equals( OldOutRec )) {
            outRec.firstLeft = NewOutRec;
        }
    }
}

in polyOuts -> outRec.firstLeft is null so outRec.firstLeft.equals method throws the Null pointer exception.

Exception stack trace

java.lang.NullPointerException
    at com.itextpdf.text.pdf.parser.clipper.DefaultClipper.fixupFirstLefts2(DefaultClipper.java:1463)
    at com.itextpdf.text.pdf.parser.clipper.DefaultClipper.joinCommonEdges(DefaultClipper.java:2121)
    at com.itextpdf.text.pdf.parser.clipper.DefaultClipper.executeInternal(DefaultClipper.java:1420)
    at com.itextpdf.text.pdf.parser.clipper.DefaultClipper.execute(DefaultClipper.java:1362)
    at com.itextpdf.text.pdf.pdfcleanup.PdfCleanUpRegionFilter.filterFillPath(PdfCleanUpRegionFilter.java:174)
    at com.itextpdf.text.pdf.pdfcleanup.PdfCleanUpRenderListener.filterCurrentPath(PdfCleanUpRenderListener.java:402)
    at com.itextpdf.text.pdf.pdfcleanup.PdfCleanUpRenderListener.renderPath(PdfCleanUpRenderListener.java:232)
    at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.paintPath(PdfContentStreamProcessor.java:377)
    at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.access$6300(PdfContentStreamProcessor.java:60)
    at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor$PaintPath.invoke(PdfContentStreamProcessor.java:1183)
    at com.itextpdf.text.pdf.pdfcleanup.PdfCleanUpContentOperator.invoke(PdfCleanUpContentOperator.java:138)
    at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.invokeOperator(PdfContentStreamProcessor.java:286)
    at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.processContent(PdfContentStreamProcessor.java:429)
    at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor$FormXObjectDoHandler.handleXObject(PdfContentStreamProcessor.java:1252)
    at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.displayXObject(PdfContentStreamProcessor.java:352)
    at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.access$6100(PdfContentStreamProcessor.java:60)
    at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor$Do.invoke(PdfContentStreamProcessor.java:988)
    at com.itextpdf.text.pdf.pdfcleanup.PdfCleanUpContentOperator.invoke(PdfCleanUpContentOperator.java:138)
    at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.invokeOperator(PdfContentStreamProcessor.java:286)
    at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.processContent(PdfContentStreamProcessor.java:429)
    at com.itextpdf.text.pdf.pdfcleanup.PdfCleanUpProcessor.cleanUpPage(PdfCleanUpProcessor.java:160)
    at com.itextpdf.text.pdf.pdfcleanup.PdfCleanUpProcessor.cleanUp(PdfCleanUpProcessor.java:135)

not sure where i am doing mistake. I even tried to see if the pdf pages contain images or other types but the pages are just text based. Please help resolve 2 issues.

As this behavior is related to the internals of your PDF, please supply it or at least an excerpt containing a few of its pages sufficient to reproduce the issues. — mkl, Sep 08 '15 at 08:55
how do i upload PDF file to my post? I dont see an option to do. — vdeveloper, Sep 08 '15 at 14:21
Stackoverflow unfortunately does not allow generic file uploads, only image uploads. Usually people upload their big sample data to some file sharing service (e.g. public shares on google drive or dropbox; please don't use file sharing services drowning the downloader in ads) and publish the URL here. — mkl, Sep 08 '15 at 15:07
Thanks. Here you go.https://www.dropbox.com/s/xznwx4ogemsgd42/spec.pdf?dl=0 — vdeveloper, Sep 08 '15 at 15:16
Indeed, I get a similar exception as you do when testing with iText 5.5.6. When testing with the current development state of 5.5.7-SNAPSHOT, though, I don't get an exception. Thus, it seems that the iText people had already been aware of the issue and fixed it. Unfortunately, though, there still is an issue, the landscape pages of your sample file are not properly handled yet. — mkl, Sep 09 '15 at 08:16
Thanks MKL!! Anyone else know how the landscape pages are handled based on my pdf file? — vdeveloper, Sep 10 '15 at 11:08
@vdeveloper where is `PdfCleanUpLocation` class? can you please share me source code for remove header footer from existing pdf? — Akash Chavda, Jan 22 '21 at 07:38

score 1 · Answer 1 · answered Sep 10 '15 at 13:04

Concerning the observations by the OP, an exception like the one presented by the OP indeed is thrown when running the OP's code with his sample file and iText and iText-xtra 5.5.6. Furthermore, the page on which this happens is empty in the result PDF.

The cause for the exception indeed is some bug, and the cause for the empty page is that the cleanup code for each processed page first removes the former content and then starts building the new content; if an exception occurs early while processing the page as in the case at hand, the result can be an empty page.

Meanwhile, though, the bug has been fixed, in a current 5.5.7 development snapshot the exception does not occur anymore.

A different unwanted effect occurs, though: the OP's sample document contains some rotated pages, e.g. page 18:

Applying the code as is to it, one gets:

The reason for this is that the PdfStamper usually tries to treat rotated portrait pages as if they were true landscape pages.As the PdfCleanUpProcessor tries is rotation-unaware, this results in mayhem

One can tell it not to do so, though, using the setRotateContents setter:

    ...
    PdfStamper stamper = new PdfStamper(reader, outputStream);
    stamper.setRotateContents(false);
    List<PdfCleanUpLocation> cleanUpLocations = new ArrayList<PdfCleanUpLocation>();
    ...

This updated code now produces:

Issue in Removing Header and Footer in PDF using iText PDF

1 Answers1

Linked