0

How do I reliably detect if a script/module is being run in the Spyder IDE?

I've hit an issue running ocrmypdf in the spyder IDE. It works on cmd and anaconda prompt. It errors out when run in the spyder IDE, on windows 7 & 10, various machines, various new/old anaconda setups. (See the stub and inline comments below for details on errors.) The developer of ocrmypdf suggested that it's due to multiprocessing not working in the spyder IDE (Python's multiprocessing doesn't work in Spyder IDE). I want to know if there's a reliable method of detecting whether ocrmypdf or any script/module is being run in the Spyder IDE.

Basically, this is a repeat of: Detect where Python code is running (e.g., in Spyder interpreter vs. IDLE vs. cmd)

I'm asking this question again because the question was originally asked in 2013 and the answer accepted - checking for environment variables which spyder sets in os.environment - is workable but has the risk of false positives.

If there's some cleverer way of resolving this please let me know!


import os, io
import ocrmypdf
from wand.image import Image as Img

try:
    from PIL import Image
except ImportError:
    import Image
    
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"


ocrmypdf_exitcodes = {0:'ok', 1:'bad_args', 2:'input file', 3:'missing_dependency', 
                      4:'invalid_output_pdf', 5:'file_access_error', 6:'already_done_ocr', 
                      7:'child_process_error', 8:'encrypted_pdf', 9:'invalid_config', 
                      10:'pdfa_conversion_failed', 15:'other_error', 130:'ctrl_c'}

path = "C:\Users\public\Documents"
tess_lang = "eng"

#Test files from https://github.com/jbarlow83/OCRmyPDF/tree/master/tests/resources

file = "skew.pdf" #works
file = "cardinal.pdf" #breaks at scanning contents section/it's been 20 minutes with no progress past first page
file = "c02-22.pdf" #Breaks at OCR section on first page - logs say 0.5 and then it stalls for 10+ minutes. Sometimes breaks by saying [Errno9] Bad File Descriptor instead.


pdf = os.path.join(path, file)
try:
    filename = pdf.rsplit('.', 1)[0]+'_new.pdf'
    ocrmypdf.ocr(input_file = pdf, output_file = filename, language = '+'.join(list(set([tess_lang, 'eng']))), rotate_pages=True, deskew=True, force_ocr = False)
except Exception as e:
    filename = pdf
    print('Error occurred when trying to process file {} error message is: {}'.format(pdf, repr(e) + " " + str(e)))
    print(repr(e))
    try:
        print(ocrmypdf_exitcodes[e.returncode])
    except:
        pass
Spencer
  • 5
  • 3
  • It's platform-dependent of course but you could look at the ancestors in the process tree to see if spyder is one of them. Not an easy task though, requires digging into the relevant native APIs - and there can always be theoretical false positives because whatever you are checking for could also be simulated by some other program... – CherryDT Jul 12 '21 at 22:07
  • Spyder uses the IPython console by default, but can be changed to use the system terminal instead for the most compatibility. In general it helps to stay away from interactive prompts for multiprocessing and GUI applications. This rules out things like Jupyter, and IPython – Aaron Jul 12 '21 at 23:39
  • Sometimes applications preemptively hold resources in scheduling; there are always chances of false-positives. – Jishan Shaikh Jul 12 '21 at 23:48
  • also: "It errors out".. please elaborate. This seems like an x-y problem to me... Spyder may not reliably capture stdout from child processes, but they should run just fine anyway. Getting print statements to work is as simple as changing the run settings to use an external terminal. Actual errors are not IDE dependent generally. – Aaron Jul 12 '21 at 23:51
  • @Aaron Thanks for your replies! I provided additional comments in the code and noted I did the sentence after I said "it errors out". Tldr, the biggest issues are Errno 9 Bad File Descriptor & ocrmypdf simply stalling out - no error message, no output created. Switching ocrmypdf to the "use-threads" mode fixes these issues. If there's a way to detect ipython, ocrmypdf can use threads automatically so future users of ocrmypdf don't need to change to system terminal. I'm not sure whether there are interactive prompts beneath the surface, but I haven't seen any while running ocrmypdf. – Spencer Jul 13 '21 at 16:46
  • @JishanShaikh The chance of a false positive is fine, I'm just trying to minimize it. What's an example of applications pre-emptively holding resources in scheduling? I'm familiar with them occupying more memory than they actually use, but wondered if there's some cool other example. – Spencer Jul 13 '21 at 16:51
  • @Spencer can you share the full traceback of the OSError? Bad file descriptor generally means broken file handle and knowing which file and where in the code would be helpful to actually fix the problem rather than just avoid it. (broken file handle can mean a number of things like trying to read from a write only file, or trying to access a file that got closed unexpectedly (possibly stdout as I see lots of stdout redirection going on with ocrmypdf)) – Aaron Jul 13 '21 at 19:43
  • `"get_ipython" in globals()` will detect IPython in general not just python. There may be configurations of IPython that would just fine, but this is the simplest answer... – Aaron Jul 13 '21 at 20:03

0 Answers0