So I have two python scripts. The first is a parser that scans through thousands of files and the second is a scheduler that forks the scan on hundreds of separate directories. My problem is this:
I have a limited amount of disk resources and each scan uses around 1GB of local sqlite3 storage. I need to limit the number of processes so that while the max number of processes is running, I wont get a disk IO error, which I've been getting.
I've tried using the following code to fork the scans and keep the processes at 8, but when I look in my temp directory (where the temp local files are stored) there are substantially more than 8 files showing my that I'm not limiting the processes properly (I use os.remove to get rid of the temp files after the scan is done).
This is my execute scan method that just forks off a process with a well-formatted command
def execute_scan(cmd):
try:
log("Executing "+ str(cmd))
subprocess.call(cmd, shell=False)
except Exception as e:
log(e)
log(cmd)
This is in my main method, where getCommand(obj) converts data in an object to a command array.
tasks = [getCommand(obj) for obj in scanQueue if getCommand(obj) is not None]
multiprocessing.Pool(NUM_PROCS).map(execute_scan, tasks)
I could use any advice I can get because I'm dealing with a lot of data and my disk is not that big.
Thanks a lot!