0

I am new to python and I wonder how can we set a specific time for a function to run.

Considering here I have a dataframe with 891 rows called df which has a column name body_text I want to perform a function call summary(to summary the text) through the row of dataframe and store them into a new dataframe called df_summary. Here my goal is to set a function summary to run at most for 2mins it will be good if it can finish before 2mins and move to the next iteration. In case it is not, I want to call the other function name preprocess instead of the function summary that cannot finish within 120s.

Consider we already created the df_summary with 891 rows and a column name body_text. Here is my code + a bit pseudo code as I dont know how to write it exactly:

import time 

for i in tqdm(range(0,891)):
  # code to detect running time here
    df_summary['body_text'][i] = summary(df['body_text'][i])# perform the summary function with at most 120s 
  # when 120s is over but the summary function cannot finish than perform a preprocessing function instead and move to the next iteration
    df_summary['body_text'][i] = preprocess(df['body_text'][i])

Note that if the function summary can run in at most 120s or under that limit time, it wont run the fallback function preprocess and then move to the next iteration.

How can I achieve this? Any help will be much appreciated.

Erwin
  • 325
  • 1
  • 9

1 Answers1

1

Here is my answer :

import sys
import threading
try:
    import thread
except ImportError:
    import _thread as thread


def exit_after(s):
    '''
    use as decorator to exit process if 
    function takes longer than s seconds
    '''

    def quit_function(fn_name):
        # print to stderr, unbuffered in Python 2.
        sys.stderr.flush() # Python 3 stderr is likely buffered.
        thread.interrupt_main() # raises KeyboardInterrupt

    def outer(fn):
        def inner(*args, **kwargs):
            timer = threading.Timer(s, quit_function, args=[fn.__name__])
            timer.start()
            try:
                result = fn(*args, **kwargs)
            finally:
                timer.cancel()
            return result
        return inner
    return outer

# I DIDN'Y WRITE THE CODE ABOVE I GOT IT FROM THERE : https://stackoverflow.com/questions/492519/timeout-on-a-function-call

@exit_after(120) # the function will kill itself after 120 seconds
def process(data): 
    summary(data)

for i in tqdm(range(0,891)):
    try:
        df_summary['body_text'][i] = process(df['body_text'][i])
    except KeyboardInterrupt: # took over 120 seconds
        df_summary['body_text'][i] = preprocess(df['body_text'][i])

After 120s, it will stop processing and will run the preprocess function. Tell me if you don't understand something

S-c-r-a-t-c-h-y
  • 321
  • 2
  • 5
  • I run until iteration 45 I got this error: ```KeyError: 45 The above exception was the direct cause of the following exception: KeyError Traceback (most recent call last) 3 frames /usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 2898 return self._engine.get_loc(casted_key) 2899 except KeyError as err: -> 2900 raise KeyError(key) from err 2901 2902 if tolerance is not None: KeyError: 45``` Can you check? – Erwin Jun 05 '21 at 21:10
  • i don't think this comes from my code – S-c-r-a-t-c-h-y Jun 05 '21 at 21:11
  • I test the iteration 46 it's fine but I am not really sure what cause the problem for this... – Erwin Jun 05 '21 at 21:13
  • it has to come from your process function or something since the error refers to a file called "base.py" – S-c-r-a-t-c-h-y Jun 05 '21 at 21:14
  • Anyway if it's coming from my code I can't do anything about it because I just copied other peoples code and adapted it to work with your code – S-c-r-a-t-c-h-y Jun 05 '21 at 21:15