0

I have downloaded pdf file from one site, and on each page there is hyperlink to this site in a rectangle. I want to remove link from every page. I am using PDFBox version 2.0.8

I figured out that link description is located in ANNOTS in every page of the document. I deleted ANOOTS corresponding to link. Of cause I set needToUpdated flag true to every node in the chain from the PDF catalog. In debug mode I see that readOnly flag is set to true in AccessPermission object. When I open edited pdf file all pages are empty and for every page Acrobat Reader shows the following error:

There was an error processing a page. Invalid Function resource.

I have several questions:

  1. Can I programmatically change the pdf file when readOnly flag is set to true in AccessPermission object?
  2. Why I get error described above?
  3. What do I need to do to remove unnecessary link from page and every page display properly in pdf document?

Here is my code(sorry for quality this is only draft):

File book = new File(path_to_pdf_file);
        PDDocument document = PDDocument.load(book);
        document.setAllSecurityToBeRemoved(true);

        COSDictionary dictionary = document.getDocumentCatalog().getCOSObject();
        dictionary.removeItem(COSName.PERMS);
        dictionary.setNeedToBeUpdated(true);

        ((COSObject) document.getDocumentCatalog().getCOSObject().getItem(COSName.PAGES)).setNeedToBeUpdated(true);
        dictionary = document.getDocumentCatalog().getPages().getCOSObject();
        dictionary.setNeedToBeUpdated(true);
        COSArray arr = (COSArray) dictionary.getDictionaryObject(COSName.KIDS);
        arr.setNeedToBeUpdated(true);


        COSArray arrayForLoop;
        COSDictionary tempDic;
        for (int k = 0; k < arr.size(); ++k) {
            COSObject object = (COSObject) arr.get(k);
            object.setNeedToBeUpdated(true);

            dictionary = (COSDictionary) object.getObject();
            dictionary.setNeedToBeUpdated(true);
            arrayForLoop = (COSArray) dictionary.getItem(COSName.ANNOTS);

            arrayForLoop.setNeedToBeUpdated(true);

            arrayForLoop = (COSArray) arrayForLoop.getCOSObject();
            arrayForLoop.setNeedToBeUpdated(true);
            dictionary = (COSDictionary) arrayForLoop.get(0);
            dictionary.setNeedToBeUpdated(true);


            dictionary.removeItem(COSName.TYPE);
            dictionary.removeItem(COSName.SUBTYPE);
            dictionary.removeItem(COSName.RECT);
            dictionary.removeItem(COSName.BORDER);

            tempDic = (COSDictionary) dictionary.getItem(COSName.A);
            tempDic.setNeedToBeUpdated(true);
            dictionary.removeItem(COSName.A);
        }
    document.saveIncremental(new FileOutputStream(path_to_save_file));
    document.close();

In code above I iterate over every page, delete ANNOTS that corresponding to link. Also I used saveIncremental method to traverse all modified nodes from leaf to root. Thank you for your answers.

  • 1
    Why are you using saveIncremental() instead of save() ? – Tilman Hausherr Nov 15 '17 at 13:32
  • Because according to this answer [link] (https://stackoverflow.com/questions/42802996/pdfbox-form-fill-saveincremental-does-not-work) PDFBox starts from catalog and inspects the NeedToBeUpdated property, and if it is set to true, PDFBox stores the object, and only in this case it recurses deeper into the objects referenced from this object in search for more objects to store. – user2956128 Nov 15 '17 at 13:46
  • 1
    That is true, but you could still use save(). – Tilman Hausherr Nov 15 '17 at 14:00
  • I have tried to use save method instead saveIncremental method and I got result pdf file as a original with links on the top of every page. Save method does not work for me. May be some thing I do wrong. – user2956128 Nov 15 '17 at 14:09
  • 1
    Removing the annotation just removes the "link effect", not the text. The text is in the content stream. Another (small) problem in your code is that you are croppling an item in the annotations array instead of just removing the item from the array. – Tilman Hausherr Nov 15 '17 at 14:15
  • Tilman Hausher could you please hit me with a link how can I remove link from content stream? – user2956128 Nov 15 '17 at 14:39
  • *"how can I remove link from content stream"* - that indeed can be highly non-trivial. Can you share the PDF in question so we can check whether there is an easy way? – mkl Nov 15 '17 at 15:25
  • link doesn't work. Maybe this is a temporary link. – Tilman Hausherr Nov 15 '17 at 16:33
  • I'm not sure if I should help there. This is likely an illegal site, the book sells for $35.99. – Tilman Hausherr Nov 16 '17 at 09:45
  • There are a lot of material in free acсes, it seems to me that I need to find out solution by my self. Thanks for help. – user2956128 Nov 16 '17 at 10:29

0 Answers0