Python using schedule library to download files

Question

I have a list of ids which are called oaids and I want them to be downloaded from monday to friday from midnight until 6pm, but only during a certain time period of the day. As it is a list, I want the job not to start again if it have been already downloaded.
Below the way I though but I am not sure, do you have any suggestions?

import schedule
from schedule import repeat
from more_itertools import chunked

def main():
  oaids = ['id1', 'id2', 'others..']
  for chunked_oaids in chunked(oaids, os.cpu_count()):
    schedule.every() \
      .monday \
      .to(5).days \
      .at('00:00:00') \
      .to(15).hours \
      .do(do_download_job, oaids=chunked_oaids)

def do_download_job(oaids):
  with ProcessPoolExecutor(os.cpu_count()) as ex:
    results = [ex.submit(download_and_upload, oaid, target_az_container, az_subfolder) for oaid in oaids]

I am creating different jobs with different ids, my hunch is that those ids will be downloaded every day. Instead I want them to be downloaded only once. Furthermore I want a chunk of ids to be downloaded only if the first chunk is completed. Any idea? — 3nomis, Sep 24 '21 at 12:46
turn hunches into facts. See what happens. Solve that problem _if_ happens. — Paul Collingwood, Sep 24 '21 at 12:50
The fact that you are using schedule.every( indicates it will be re-scheduled. Does the package have a "once" option perhaps? — Paul Collingwood, Sep 24 '21 at 12:50
@PaulCollingwood Yeah potentially I can call a `schedule.CanceJob` once finished. Just concerned about if each run is chronological and waits for the previous to be completed. Is a `fact` a schedule library object? — 3nomis, Sep 24 '21 at 12:55
I depends on how the scheduler works. I'm not familiar with that. I do this sort of work with Celery but that may be overkill for this — Paul Collingwood, Sep 24 '21 at 13:01

Python using schedule library to download files

0 Answers0