I have a Django project that creates PDFs using Java as a background task. Sometimes the process can take awhile, so the client uses polling like this:
- The first request starts the build process and returns
None
. - Each subsequent request checks to see if the PDF has been built.
- If it has been, it returns the PDF.
- If it hasn't, it returns
None
again and the client schedules another request to check again in n seconds.
The problem I have is that I don't know how to check if the PDF is finished building. The Java process creates the file in stages. If I just check if the PDF exists, then the PDF that gets returned is often invalid, because it is still being built. So, what I need is an is_pdf(path_to_file)
function that returns True
if the file is a valid PDF and False
otherwise.
I'd like to do this without a library if possible, but will use a library if necessary.
I'm on Linux.
Here is a solution that works using pdfminer, but it seems like overkill to me.
from pdfminer.high_level import extract_text
def is_pdf(path_to_file):
"""Return True if path_to_file is a readable PDF"""
try:
extract_text(path_to_file, maxpages=1)
return True
except:
return False
I'm hoping for a solution that doesn't involve installing a large library just to check if a file is a valid PDF.