Possible Duplicate:
solution to convert PDFs, DOCs, DOCXs into a textual format with python
I am making a document search engine which indexes popular binary formats. I am looking for python libraries for this purpose.
Reliable converters proved too hard to find. PyPDF never works accurately. Please reccomend:
- python libraries that convert these formats to text
- or cross-platform, standalone programs that can be called as a subprocess