0

Do we know how we can remove EXIF data from a PDF using PDFBox (preferably) or some other tool? Any experience with this would be very useful. Thanks.

Dushyanth

  • Thanks for your response K J. I actually mean embedded image EXIF data. – Dushyanth Balasubramanian Jun 29 '21 at 23:51
  • Also have a look at the source of ExtractImages. From there replace the resource entry with the updated image xobject. However this will not work for inline images. This unusual case is much more complex. – Tilman Hausherr Jun 30 '21 at 09:55
  • Do you really mean the EXIF data of the contained images? Because most images are stored in RAW format so no metadata is contained. Only JP2K images may contain some metadata. Or do actually mean the PDF metadata in form of XMP meta data? Because PDFs do not use EXIF data directly... – Lonzak Jun 30 '21 at 09:57
  • @Lonzak JPEG images (DCTDecode) can contain EXIF. Don't know about JBIG2. – Tilman Hausherr Jun 30 '21 at 10:10
  • @TilmanHausherr See the [answer](https://stackoverflow.com/questions/5653826/how-can-i-extract-images-and-their-metadata-from-pdfs/6197171#6197171) from Mark Storer here – Lonzak Jun 30 '21 at 12:59
  • Thanks for all your valuable comments. PDFs contains exif data, originating from the embedded images in them and my goal is to strip the exif data. – Dushyanth Balasubramanian Jul 01 '21 at 05:09
  • So is there any other way (any other tool which can help to remove EXIF data from pdf?) – Dushyanth Balasubramanian Jul 01 '21 at 05:10
  • I see this to be a potential: https://gitlab.tails.boum.org/tails/blueprints/-/wikis/doc/mat/ Anyone has used this? – Dushyanth Balasubramanian Jul 01 '21 at 05:10
  • Also using PDFBox to remove EXIF data is overkill correct? – Dushyanth Balasubramanian Jul 01 '21 at 05:12
  • No, I don't think it is overkill. If you have a tool that removes it from a JPEG then it's a few hours of work if you're familiar with PDFBox. Less than an hour if you know that all images are top level. – Tilman Hausherr Jul 01 '21 at 07:54
  • Thanks @TilmanHausherr The only tool which I have in mind at the moment is https://gitlab.tails.boum.org/tails/blueprints/-/wikis/doc/mat/ However it looks like this tool can also remove EXIF data from PDF itself. If that is the case I am thinking of using this tool. – Dushyanth Balasubramanian Jul 02 '21 at 19:20

0 Answers0