I'm experimenting with pdftohtml but I'm finding that it's occasionally having difficulty parsing tables correctly. It's grouping the text from two columns into a single cell, which makes my attempts to parse the resulting data futile!
Note that this occurs only once or twice within a PDF and is quite unpredictable.
I've tried the latest versions of pdftohtml (including the 0.40a beta), but to no avail.
Is anyone aware of any Linux-compatible equivalents that might be worth trying?
Thanks,
Sam