0

I would like to write a small program, or script, to extract a set of pictures from a pdf.

I have several PDFs, they each have a table of pictures. I would link to have one picture per file. Therefore I need a way to extract them. Due to the nature of the PDF (A table/grid), it seems that it would be much easier to write a program, than do some manual method. However I have no idea what tools are available.

What libraries are available?


Preference Python, then C# or Java, then maybe some other language (My C and C++ is rusty, I have not done them for years).

I am on Debian Gnu/Linux, so have a wide choice of tools.

ctrl-alt-delor
  • 7,506
  • 5
  • 40
  • 52

1 Answers1

0

I went with pdfbox (an Apache project, so Free Software) it is a java library and a command line tool (the app module). I then scripted it with a bit of python to process the extracted text (yes it did that as well), and rename the image files.

ctrl-alt-delor
  • 7,506
  • 5
  • 40
  • 52