0

I am trying to run an apriori analysis on a series of hashtags scraped from Twitter in python using jupyter lab, and need to find a way to time out a function after a certain period of time. The function is being run using a while loop that incrementally reduces the size of the support value, and stops after ten seconds has passed.

def association_rules(hashtags_list):
    # Convert the list of hashtags into a list of transactions
    transactions = [hashtags for hashtags in hashtags_list]
    # Initialize the support
    min_support = 1
    # Initialize the confidence
    min_confidence = 0.1
    # Initialize the lowest support
    lowest_support = 1

    # Start the timer
    start_time = time.time()
    while True:
        try:
            # Find the association rules
            association_rules = apriori(transactions, min_confidence=min_confidence, min_support=min_support)
            # Convert the association rules into a list
            association_rules_list = list(association_rules)
            # End the timer
            end_time = time.time()
            # Calculate the running time
            running_time = end_time - start_time
            
            # check if running time is over the maximum time
            if running_time >= 10:
                break
            lowest_support = min_support
            if min_support > 0.01:
                min_support = min_support - 0.01
            else:
                min_support = min_support - 0.005
                
            if min_support <= 0:
                min_support = 0.01
                
        except Exception as e:
            print("An error occurred:", e)
            break
    
    return association_rules_list, round(lowest_support, 3)

The problem this causes is because the timeout is called within the loop itself, it is possible for the loop to get hung up if the apriori support value gets too low before hitting the 10 seconds, which often happens with small datasets, so I need an external function to stop the loop.

I've been looking into parallel processing with no success, and still can't really determine if it can even be carried out in Jupyter Lab.

Any ideas on how to stop a function would be appreciated.

Edited to add that I am running on Win 10, which may effect some options.

ARH
  • 127
  • 6
  • 1
    You can use the signal library as described in https://stackoverflow.com/questions/366682/how-to-limit-execution-time-of-a-function-call – Marc Morcos Jan 25 '23 at 16:24

3 Answers3

0

Edit: After some discussion in the comments, here's a different suggestion.

You can terminate a function from outside the function by causing an exception to be thrown via the signalling mechanism as in the example below:

import signal
import time
import random

def out_of_time(signum, frame):
    raise TimeoutError

def slow_function():
    time.sleep(random.randint(1, 10))
    return random.random()

signal.signal(signal.SIGALRM, out_of_time)
signal.alarm(5)

v = 0
while True:
    try:
        v = slow_function()
    except TimeoutError:
        print("ran out of time")
        break

print("v:", v)

What's going on here is that we have a function, slow_function, that will run for an unknown period of time (1-10 seconds). We run that in a while True loop. At the same time, we've set up the signal system to throw a TimeoutError after 5 seconds. So when that happens, we can catch the exception.

A few things to note though:

  1. This does not work on Windows.
  2. There is absolutely NO GUARANTEE that the code will be where you think it will be when the exception is thrown. If the loop is already completed and the interpreter is not currently running slow_function, you don't know what will happen. So you'll need to 'arm' the exception-throwing mechanism somehow, for example by checking the frame parameter that is passed to out_of_time to make sure that the exception is only thrown if the signal comes while we're inside the expected function.
  3. This is kind of the Python equivalent of a goto in the sense that it causes the execution to jump around in unexpected ways.

A better solution would be to insert some code into the function that you want to terminate to periodically check to see if it should keep running.


Change the while loop from while True to:

while time.time()-start_time < 10 and <some other criteria>:
    # loop body

Then you can get rid of the break and you can add whatever halting criteria needed to the loop statement.

Simon Lundberg
  • 1,413
  • 2
  • 11
  • 23
  • How does that solve the problem of the algorithm getting hung up while inside the loop before the time threshold though? – ARH Jan 25 '23 at 16:24
  • I might have misunderstood the problem. How does the apriori function cause the hang? Is the problem that the apriori function itself takes too long to run once? I assumed the problem was that it was getting called too many times. – Simon Lundberg Jan 25 '23 at 17:54
  • The problem is that the algorithm can take too long to run, as it can reach a point where it gets exponentially more complex to calculate as its support value decreases and use up all system memory. I want to be able to run the algorithm for a maximum of ten seconds, incrementally lower the support value each time, and if it happens to reach a support value where the calculation gets too complex, the algorithm is stopped, and the last support value is returned. – ARH Jan 25 '23 at 18:07
  • In that case you need to add some termination signal to the function you’re calling. You can’t* terminate a function from outside the function. *sort of – Simon Lundberg Jan 25 '23 at 18:51
0

This is what I have so far, which seems to work, but I could be wrong...

import time
import func_timeout
import tempfile

def test_function():
    highest_n = 0
    n = 0
    temp_file = tempfile.NamedTemporaryFile(delete=False)
    start_time = time.time()
    while time.time()-start_time < timeout_time:
        n += 1
        if n > highest_n:
            highest_n = n
            temp_file.write(str(n).encode() + b' ' + str(highest_n).encode())
    temp_file.close()
    return highest_n

timeout_time = 5
try:
    result = func_timeout.func_timeout(timeout_time, test_function)
    print("highest number counted to: ",result)
except func_timeout.FunctionTimedOut:
    print("Function Timed Out after ", timeout_time, " seconds")
    temp_file = open(temp_file.name, 'r')
    n, highest_n = temp_file.read().split()
    n = int(n)
    highest_n = int(highest_n)
    print("highest number counted to: ",highest_n)
ARH
  • 127
  • 6
0

I think I found a simpler way just using a list rather than a temporary file.

import time
from func_timeout import func_timeout, FunctionTimedOut

start_time = time.time()
temp_list = []

running_time = 3.3

def while_loop():
    
    min_support = 1
    lowest_support = 1
    
    while True:
        time.sleep(0.1)
        if lowest_support > 0.01:
            lowest_support -= 0.01
        else:
            lowest_support -= 0.001
            
        if lowest_support <= 0:
            lowest_support = 0.001
        temp_list.append(lowest_support)
        if time.time() - start_time > running_time:
            break

try:
    func_timeout(running_time, while_loop)
except FunctionTimedOut:
    pass

temp_list[-1]
ARH
  • 127
  • 6