I'm a bit of a Python newby here. I'm working on code that will request JSON data from a web URL, continue to update the request until data through a certain timeframe is reached, save all data to a file (it could be millions of lines so I'm trying to keep it out of memory), the compress the data to a single line of a CSV file after a statistical analysis. I've got that section of code down, but the program loops through a list of several thousand names that are used in a variable to call the URL. If I run it in a single loop, it takes longer than my timeframe and the program continues to fall behind.
I've attempted to run this as either an asyncio loop using ThreadPoolExecutor and as a pool with dozens of workers. I'm able to use substantially more threads than I have available processor cores because the bulk of the time is spent waiting on URL responses, which opens threads to make new requests.
That said, I can't get any form of a pool or loop to continue past a single iteration of the while loop. The code looks something like this:
variables = ['thousands', 'of', 'variables']
interval = 15 # in minutes
class DoSomething()
def dosomething(self, variable, date, initialtime, interval):
callweburl(variable, date, initialtime, interval)
runstatistics
saveCSV
def worker(variable)
try:
ds = DoSomething(variable, date, initialtime, interval).dosomething()
api.ds(variable)
except:
return False
pool = Pool(100)
program to get date, initialtime and currenttime
while initialtime < currenttime:
while initialtime < initialtime * multiple of interval
if __name__ == '__main__'
for variable in variables:
pool.apply_async(worker, (variable,))
initialtime = initialtime + interval
program to get date, new initialtime and currenttime
time_to_pause = initialtime - currenttime + interval
if time_to_pause > 0.0:
time.sleep(time_to_pause)
The loops run fine when I'm replacing the apply_async
call with DoSomething(variable, date, initialtime, interval).dosomething()
. When I run them with either a pool or loop, they become sporadic at best. Depending on where I place pool.close()
and pool.join()
, they either run for a single loop and close the program or will be all over the board for the intervals the pool is collecting for. Sometimes it will collect data for the same time interval twice and other times it will skip ahead by days at a time.
Is there a way to close out a loop or pool and reinitialize it? I've also tried moving the pool or loop initialization to before the while loops are called. Nothing seems to work quite right.
Thanks in advance for any help!