I am using Textract in python on a web server as part of an API. I want to post a url to the server with a url and have textract extract text from that url (eg http://www.sample-videos.com/pdf/Sample-pdf-5mb.pdf)
I get a 502 proxy error in response when I try to post and my python log shows
textract.exceptions.MissingFileError: The file "http://www.sample-videos.com/pdf/Sample-pdf-5mb.pdf" can not be found.
Is this because Textract can't extract from remote files and if so, is there a work around?
Thanks!