First, my apologies as a Python newbie that I'm asking this question. It probably has nothing at all to do with convertapi and more to do with my basic lack of understanding as to how to interact with APIs.
I'm reading a Google sheet to find embedded hyperlinks containing references to files (PDF, html, whatever) and then using convertapi to get a txt version so that I can do content analysis based on existence, count and proximity of various terms.
My question has to do with the convertapi.convert failing because (in this case) it turns out convertapi thinks the PDF is invalid (because I have tested the file @ convertapi.com and it returned a 5002 error). I don't dispute the file may be bad - all I want to do is detect that convertapi.convert can't convert the file so that I can ignore it and move on.
My python code has a small function:
def convert_PDF_to_text(inputfilename):
result = convertapi.convert('txt', { 'File': inputfilename }, from_format = 'pdf')
result.save_files('converted_pdf_files')
...and while it works fine for some inputs there is a particular URL PDF that results in this output (including my own messages from program):
about to call convertapi.convert with filename (https://www.epa.gov/sites/production/files/2016-06/documents/2016_policy_order_revision_6-10-16.pdf)
yes this is the specific file causing the problem: https://www.epa.gov/sites/production/files/2016-06/documents/2016_policy_order_revision_6-10-16.pdf
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/convertapi/client.py", line 46, in handle_response
r.raise_for_status()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://v2.convertapi.com/convert/pdf/to/txt?Secret=PIuLcqNVL8w4rc9Y
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./p1.py", line 244, in <module>
convert_PDF_to_text(source_URL)
File "./p1.py", line 63, in convert_PDF_to_text
result = convertapi.convert('txt', { 'File': inputfilename }, from_format = 'pdf')
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/convertapi/api.py", line 7, in convert
return task.run()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/convertapi/task.py", line 26, in run
response = convertapi.client.post(path, params, timeout = timeout)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/convertapi/client.py", line 16, in post
return self.handle_response(r)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/convertapi/client.py", line 49, in handle_response
raise ApiError(r.json())
convertapi.exceptions.ApiError: <exception str() failed>
I know it should be obvious just from the errors what I should check...but I'm too much of a newbie to Python and APIs to know how to decipher.
How do I test for errors so that my Python code doesn't abort?
Thanks in advance and again sorry for the basic question - yes I did search for answers and don't find anyone addressing my question, it's likely too simple...