I have a stack of PDFs - potentially hundreds or thousands. They are not all formatted the same, but any of them MAY have one or more tables with interesting information that I would like to collect into a separate database.
Of course, I know I have to write something to do this. Perl is an option for me - or perhaps Java. I don't really care what language so long as it's free (or cheap with a free trial period to ensure it suits my purposes).
I'm looking at CAM::Parse (using strawberry Perl), but I'm not sure how to use it to locate and extract tables from the files. I guess I do have a preference for Perl, but really I want something that works dependably and is reasonably easy to do string manipulations with.
What is a good approach for something like this? I'm at square one, so if java (or python etc.) have better hooks, now is a good time to know about it. General pointers good; starter code would be strongly preferred.