0

I have a Python 3 script (using PRAW library) that executes once and ends. It currently is automated using cron jobs and runs every 45 minutes.

There is a need to change this to a persistence bot so the bot is always "online", so cron cannot be used. Part of this is easy enough:

def bot_loop():
   running = True
   while running:
      try:
         #This should occur constantly and monitor Reddit API stream
         for submission in subreddit.stream.submissions(skip_existing=False):
            print(submission.title)
         #TODO: below condition to  only execute the below every 45 minutes
            health_checks()
      except KeyboardInterrupt:
         print('Keyboard Interrupt. Ending bot.')
         running = False
      except Exception as e:
         print('Exception raised per below. Attempting to continue bot in 10 seconds.')
         print(e)
         time.sleep(10)

What would be the best logic in this loop to ensure the health checks only runs every 45 minutes? Meanwhile the rest of the script would continue and process. Additionally, what is the best way to also ensure that if for some reason it does not run on the 45th minute (say at xx:45:00) such as perhaps CPU is busy elsewhere, it runs at the next opportunity?

The logic should be:

Considerations could be if minute == 45, but that alone has issues (it would run at least 60 times in the minute).

Zeno
  • 1,769
  • 7
  • 33
  • 61
  • There are cron-like modules for Python, which might be worth investigating, as such do the heavy lifting (suitability of integration / features / quality varies): https://pypi.org/search/?q=cron – user2864740 Jul 18 '20 at 05:21
  • 1
    Sleep for 45 minutes as the last statement in the `try` block. – John Gordon Jul 18 '20 at 05:28
  • @JohnGordon I updated my question to clarify, as I wasn't clear. The whole script shouldn't sleep for 45min, only that code section. – Zeno Jul 19 '20 at 05:11
  • Since this code is not left anyways (it is an infinite loop) no other code section can run - regardless how the waiting is done. How would your expected behavior differ from a sleep for 45 minutes? – MisterMiyagi Jul 19 '20 at 05:14
  • The bot should be monitoring `subreddit.stream.submissions()` 24/7. But the bot should only perform "main code" (such as health checks) every 45min – Zeno Jul 19 '20 at 15:43
  • From the shell, you can simply do `watch -n 3600 ./yourscript` to run every hour. – NVRM Jul 19 '20 at 17:06
  • @NVRM What purpose does this accomplish? The health check is things like Reddit API checks, not so much checking the script uptime. – Zeno Jul 19 '20 at 19:13
  • It's a way to code. `./yourscript` execute the task one time, then quit. No need to create loops. – NVRM Jul 19 '20 at 19:26
  • `subreddit.stream.submissions()` from PRAW needs to run 24/7. – Zeno Jul 20 '20 at 00:21
  • My answer to this question : https://stackoverflow.com/questions/45909652/update-csv-with-new-values-every-minute-sched-time-vs-apscheduler answers your question as well. In fact, by the code logic of answers, they could be considered duplicates. Although they are very differently composed as Qs. Read it as no answer here offers you threading.Timer() solution yet. I post it as a comment because it seems silly to retype everything. – Dalen Jul 27 '20 at 14:38
  • Only thing I can add to it is that you do not have to be afraid of using time.sleep(). sleep() is a system call and it tells the OS scheduler to put your thread to sleep for a given number of seconds. OS will not be bothered by the thread until interval expires and the thread will not be using CPU during the sleep(). Only way to wake up your sleeping thread from outside is by sending it a signal to interrupt it. Then the OS will wake up your thread and pass it the signal, which is usually SIGTERM or SIGINT. In this case time.sleep() raises an exception, so put it in a try-except block. – Dalen Jul 27 '20 at 14:48

5 Answers5

3

Try using celery. With celery, you can relaunch the subsequent task with eta=45 minutes when the task finishes.

PS. I am not writting the entire snippet, but just a skeleton. You can use max_retries etc for multiple attempts in case of failure

@task
def my_task(...):
     ....
     my_task.delay(args=..., eta=45 * 60)
  • How would this be different compare to the suggested use of sleep? – Zeno Jul 18 '20 at 14:39
  • It is super different. In sleep, the process is still running the entire time, and the task might not even run if the process gets killed midway during sleep. But via celery, you are adding a message to the queue providing it an eta when to launch the task – Ehtesham Siddiqui Jul 18 '20 at 14:55
  • 1
    I updated my question as I missed part of my intent. So to clarify, my_task would run every 45min but the rest of the Python script would continually run? – Zeno Jul 19 '20 at 05:12
2

Two options come to mind:

  1. The sched event handler.
  2. The very similar named schedule by Dan Bader of Real Python
import schedule
import time

def job():
    print("I'm working...")

schedule.every(45).minutes.do(job)
schedule.every().hour.do(job)
schedule.every().day.at("10:30").do(job)
schedule.every(5).to(10).minutes.do(job)
schedule.every().monday.do(job)
schedule.every().wednesday.at("13:15").do(job)
schedule.every().minute.at(":17").do(job)

while True:
    schedule.run_pending()
    time.sleep(1)
hrokr
  • 3,276
  • 3
  • 21
  • 39
1

You need multi-threading for that purpose.

import threading


def main_code():
    print("Doing health checks and other stuff...")
    threading.Timer(45*60, main_code).start()

Call main_code at start of bot_loop. The code schedules itself to run 45 minutes later each time it is called.

Executing periodic actions in Python

What is the best way to repeatedly execute a function every x seconds?

Nima
  • 381
  • 2
  • 8
1

There are multiple Python modules available to accomplish your required needs. Some of these modules include:

Modules:

Advanced Python Scheduler (APScheduler)

schedule

timeloop

In 2019, I posted an answer for another question on using a scheduler in Python. Here is that question and my answer.

Concerning your question here are two ways to tackle the problem.

Timeloop Example:

import time
from datetime import timedelta
from timeloop import Timeloop

tl = Timeloop()

# @tl.job(interval=timedelta(minutes=45))
@tl.job(interval=timedelta(minutes=1))
def health_checks():
    print('running a health check')
    print("job current time : {}".format(time.ctime()))

def bot_loop():
    # timeloop is designed to run on a separate thread
    tl.start()
    while True:
        try:
           print ('running bot')
           time.sleep(10)
         except KeyboardInterrupt:
             print('Keyboard Interrupt. Ending bot.')
             tl.stop()
          except Exception as e:
              print('Exception raised per below. Attempting to continue bot in 10 seconds.')
              print(e)
              time.sleep(10)


if __name__ == "__main__":
   bot_loop()

Schedule Example:

import schedule
import time

def health_checks():
    print('running a health check')
    print("job current time : {}".format(time.ctime()))

def bot_loop():
    while True:
        try:
           print ('running bot')
           time.sleep(10)
         except KeyboardInterrupt:
             print('Keyboard Interrupt. Ending bot.')
             schedule.CancelJob()
         except Exception as e:
             print('Exception raised per below. Attempting to continue bot in 10 seconds.')
             print(e)
             time.sleep(10)


if __name__ == "__main__":
  # schedule.every(45).minutes.do(health_checks)
  schedule.every(1).minutes.do(health_checks)
  while True:
      schedule.run_pending()

  bot_loop()
Life is complex
  • 15,374
  • 5
  • 29
  • 58
1

The inner core of the program looks like:

for data in read_from_data_stream():
      process(data)

where read_from_data_stream() is provided by some 3rd party library and yields some kind of data as it arrives. In addition to the service above, a health_check() function should be called every 45 minutes.

The problem is that we want to do two activities at the same time. This is not possible in a regular single threaded program. But is it really a problem?

We could do at least some health checking:

CHECKTIME = 45*60.0  # in seconds

last_check = time.monotonic()
for data in read_from_data_stream():
    process(data)
    now = time.monotonic()
    if now > last_check + CHECKTIME:
        health_check()
        last_check = now

and it could be sufficient, even almost equivalent to the original specification, if data is coming often, say at least every few seconds or so.

But even if new data is not available for longer periods of time, it might be still acceptable. If there is no activity, no data processing, the health_check could be unnecessary, because nothing has changed since the last one.

The code can be further improved if the read_from_data_stream() offers a timeout option:

while True:
    try:
        for data in read_from_data_stream(timeout=CHECKTIME):
            ... for loop body as above ...
    except DataTimeoutError:
        ... run extra health_check ...
        continue

If the solution above is not good enough, there are two options. Use an async version of the library, if available, or spawn a new thread:

The main thread runs the loop, the additional thread runs the health check:

while True:
    time.sleep(CHECKTIME)
    health_check()

but almost certainly the health_check() and process(data) are accessing the same internal data structures, making a mutex lock mandatory.

VPfB
  • 14,927
  • 6
  • 41
  • 75