0

I'm not able to extract the text from *.doc file

 res = subprocess.Popen(['antiword', filepath], stdout=subprocess.PIPE, stderr=subprocess.PIPE,
                                shell=True).communicate()[0], extension
        print(res)
        #return res
        exit()

I'm getting the result like

(b'', 'doc')

Telen
  • 185
  • 1
  • 4
  • 17
  • 1
    Just to make sure: have you checked if `antiword` could really read the file? It is a rather ancient piece of software from 2005, so having a file with extension `.doc`, in 2018, absolutely does not ensure that it can be read this way. – tevemadar Nov 29 '18 at 16:06
  • Can you pls tell how i can check it? – Telen Nov 29 '18 at 17:13
  • You visit the file's folder from command line, type `antiword xy.doc` and see what happens. If it does not work there, it will not work from Python either. – tevemadar Nov 29 '18 at 17:38
  • seems antiword is not there. I have tried to find antiword for windows but i could not get it. can you suggest other method to extract text from .doc file ? – Telen Nov 30 '18 at 13:12
  • Possible duplicate of [Read .doc file with python](https://stackoverflow.com/questions/36001482/read-doc-file-with-python) – tevemadar Nov 30 '18 at 13:28

0 Answers0