1

Does anyone know if there is a way to check for a watermark on a PDF document using iTextSharp?

I want to do this before adding a new one. In my case, I have to add a new watermark if it wasn't already added by someone, but I don't know how to check this using iTextSharp's PdfReader class. Something like this:

var reader = new PdfReader(bytes);
var stamper = new PdfStamper(reader, ms);
var dc = stamper.GetOverContent(pageNumber);
bool alreadyStamped = cd.CheckIfTextOrImageExists();
JNYRanger
  • 6,829
  • 12
  • 53
  • 81
  • This most likely depends on the PDF type and how it was created. Most PDFs are just wrapped up image files, and in these cases there is not a way to do this with iTextSharp. The only way for this to work would be if you can detect multiple layers in the PDF and locate an image within one of those layers. However, usually when a PDF is 'published' these layers are flattened. – JNYRanger Jul 28 '15 at 19:41
  • 1
    First off, there's true annotation watermarks and there's just text and/or images that you personally consider to be a watermark. If you add a true watermark than you should just be able to walk each page's `/ANNOTS` dictionary. See [this for a start on that](http://stackoverflow.com/a/8141831/231316). If you've just got arbitrary text then you'll need to [extract text](http://stackoverflow.com/q/16398483/231316) and if you've got images then [extract images](http://stackoverflow.com/a/10692390/231316). Also, [read the comments here](http://stackoverflow.com/q/18018902/231316) on watermarks – Chris Haas Jul 28 '15 at 21:08
  • thanks @ChrisHaas for your suggestion. I'll try to find a particular word in the PDF document. That might work. I'm not using images for sure, only plain text. – Semen Shekhovtsov Jul 28 '15 at 21:49
  • using text search worked in my case. `code`PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);`code` – Semen Shekhovtsov Jul 28 '15 at 21:56

1 Answers1

0

After some investigation thanks to the @ChrisHaas comment I was able to achieve that verification. So, if text is present on the particular page, I can find it using SimpleTextExtractionStrategy, even if it's in the WaterMark collection.

PdfReader pdfReader = new PdfReader(bytes);
  for (int page = 1; page <= pdfReader.NumberOfPages; page++)
  {
    ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();

    string currentPageText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
    if (currentPageText.Contains(searthText))
    {
      // adding new WaterMark here
      Console.WriteLine("text was found on page "+i);
    }
  }
pdfReader.Close();

Hopefully, this approach helps someone, who got a similar issue.