0

I'm looking for a free library (Java/Ruby), that can run on linux, and can extract images and annotations from PDFs; similar to what CGPDFDocument can do on OS X.

Thanks!

1 Answers1

6

I don't know about images, but using the last version of the ruby pdfreader library I was able to succesfully extract the annotations from a big PDF file:

PDF::Reader.open(filename) do |reader|
  reader.pages.each do |page|
    annots_ref = page.attributes[:Annots]
    actual_annots = reader.objects[annots_ref]
    if actual_annots && actual_annots.size > 0
      actual_annots.each do |annot_ref|
        actual_annot = reader.objects[annot_ref]
          unless actual_annot[:Contents].nil?
            puts "Page #{page.number},"+actual_annot[:Contents].inspect
          end
        end
    end
  end       
end

I imagine that something like it could be done to extract images.

Ely Alvarado
  • 61
  • 1
  • 2