0

I wrote a program using PDF version 2.0.27 to remove watermarks, but this method can only remove watermarks below the content, not above it. Is it possible to remove watermarks from the content and save the removed ones as a new PDF file? thank you very much!

I want to find the correct code for learning

  • pdfbox version 2.0.27 – lizepengg Aug 01 '23 at 09:35
  • How do you intend to recognize watermarks? – mkl Aug 01 '23 at 22:18
  • I deleted the watermark below the content by obtaining xObject Name to delete the corresponding object, but the watermark above the content obtained a null object. This is my first time manipulating PDF, and it's also my first time using this PDFBox. I may not have a deep understanding of it – lizepengg Aug 02 '23 at 06:42
  • Watermarks are not necessarily contained in some Xobject. Do you only target PDFs with watermarks in Xobjects? – mkl Aug 02 '23 at 20:19
  • You're right, the watermark in my document is not in XObject. I deleted it using the Tj identifier, but there was an accidental deletion of some content using the Tj identifier. Later, I added a judgment to determine if the content in COSString is longer than 40 to further determine whether it is a watermark. This 40 was set by myself based on the length read, and ultimately the watermark was deleted. But I don't think my approach is very good because I have to manually set the judgment length again to remove the watermark. – lizepengg Aug 03 '23 at 03:39
  • The length of the **Tj** argument usually is no indication. There are many documents in which only short strings are drawn. In my opinion you should collect a large set of pdfs with watermarks from many different producers and analyze how watermarks are created in those pdfs. Then you can start implementing a remover for them. – mkl Aug 03 '23 at 05:26
  • If we want to implement remove, where should we proceed? Is it implemented in a subclass of PDFStreamEngine? – lizepengg Aug 03 '23 at 06:08
  • There is no high-level instruction removal API in PDFBox but it offers the low-level mechanics to create such an API. For example, you can use the `PdfContentStreamEditor` from [this answer](https://stackoverflow.com/a/58501254/1729265). You can find some examples for using it in [the answers linked to that question](https://stackoverflow.com/questions/linked/58475104?lq=1). – mkl Aug 03 '23 at 07:44
  • Can you tell me which version of PDF box you are using? – lizepengg Aug 03 '23 at 09:11
  • It depends. For what? – mkl Aug 03 '23 at 10:40
  • 1
    Note that (I hope that) many users here will hesitate to answer a question with something that will help doing something of questionable legality/morality. Removing watermarks is surely against the intereset of whoever applied them and most likely enables the user to do what they are trying to prevent and presumably have a right to prevent. You probably hence want to explain why you need help with removing watermarks from documents which you have a right to access/use without a watermark. If you have a right, surely you have a different/legal way to access them. – Yunnosch Aug 21 '23 at 08:17

0 Answers0