0

In my code, I have a function that has an integer as an input. This input affects heavily the running time of this function. This is the code line where I call the function, with the input value of 0.35.

frequent_itemsets = get_frequent_items(0.35)

The get_frequent_items function returns a DataFrame, and next in the code I using this DataFrame for other computations, so I need this method to return the DataFrame (here called frequent_itemsets) to be able to continue the code.

Knowing that the input value of the integer of the function (here 0.35 ) in the example, affects the running time heavily, (for example if it is 0.35 the functions takes 28 seconds to return , and if it is 0.3, the function will take 2 Hours to return).

I am thinking of limiting the input options values for the function to the options

var_support_options = [0.18, 0.2, 0.25, 0.3, 0.35]

Now, my questions is, is there a way to write the code in a way that it try the function using these input values (provided in var_support_options) list, starting from the lowest value to the biggest.

EXAMPLE OF DESIRED process:

iteration 1 : frequent_itemsets = get_frequent_items(0.18)

  • if this iteration takes more than 30 seconds, stop the iteration and try the next value in the input list (in the example 0.2).

  • else if this takes less than 30 seconds, return the frequent_itemsets dataframe and continue the code.

I want the function to be done in less than 30 seconds using the least input integer value and then return the result and continue to the next lines of code.

Should I do that using multithreading, multiprocessing or other ? And how the code should be.

furas
  • 134,197
  • 12
  • 106
  • 148
  • 2
    Does this answer your question? [How to limit execution time of a function call?](https://stackoverflow.com/questions/366682/how-to-limit-execution-time-of-a-function-call) – matszwecja Aug 11 '22 at 11:10
  • Hello, thanks for your suggestion. However, no this is not what i was looking for, as in my code, i want to control the running time of a function while the next lines of codes are dependent of the output of this function, so this function needs to return the dataframe value. Mainly, i just want to see if there is a way to run the function while stopping the program waiting for this function to return, however, i do not want the function to take more than 30 seconds, if it takes more, try other input value. – AhmadKhoder Aug 11 '22 at 12:58
  • Either you are not explaining your use-case correctly or that is exactly what is solved in the linked question. – matszwecja Aug 11 '22 at 13:02
  • don't stop main code but run loop which periodicaly check if there is result from other process and check if it takes more than 30 second. If you get result then you can exit loop. If it takes more than 30 second then you can kill it and run new process with new value - and use again loop to check it. – furas Aug 11 '22 at 13:08

1 Answers1

0

You can use multiprocessing to run code because it has method to kill/terminate it. But it needs queue to send result back to main process (because processes don't share memory and they can't use global variable)

And main process would run loop which periodically check if there is result in queue and if it time to kill/terminale other process.

One problem is that in multiprocessing processes don't share memory so main process has to send data to processes and it uses file created with pickle - so for big file it may need extra time.


Minimal working example.

It is similar to examples in answers suggested by @matszwecja
How to limit execution time of a function call?

import multiprocessing
import time

def get_frequent_items(queue, value):
    # simulater work with differen time
    time.sleep(10-(value*10))  

    # send result
    queue.put(value*2)

def run(value, timeout=30):
    
    # qeueu to get result
    q = multiprocessing.Queue()

    # start process
    p = multiprocessing.Process(target=get_frequent_items, args=(q, value))
    p.start()

    start = time.time()

    while True:

        time.sleep(0.1)  # reduce CPU consumption

        end = time.time()
        
        print(f'time: {end-start:.1f}', end='\r')

        if q.empty():                # check if there is result in queue
            if end-start > timeout:  # check if it is time to kill process
                p.terminate()
                return None          # return None when there is no result
        else:
            return q.get()           # return result
        
# ---- main ---

if __name__ == '__main__':
    
    for var_support_options in [0.18, 0.2, 0.25, 0.3, 0.35]:

        result = run(var_support_options, timeout=7)

        print('result:', result, 'for', var_support_options)

        # exit loop when you get first result
        if result:
            break
        
    # --- after loop ---

    if result:
        print('final result:', result, 'for', var_support_options)
    else:
        print('no result')
furas
  • 134,197
  • 12
  • 106
  • 148
  • I would like to thank you. Indeed, I have seen the response proposed by @matszwecja, and would like to thank him also. However, I am developing an application using Django, so I would like to avoid queues as much as possible. because when I use a queue in my code, this will always block my application to the main process and hence, when I use a queue, all other processes are blocked until the current process finishes. What I want, is being able to access other features of the applications, while the function I am running is executing in the background without blocking the main process. – AhmadKhoder Aug 16 '22 at 14:39
  • if you need to runs something which doesn't need results from processes then run it in separated thread - or run loop `while True` in separated thread. And if it needs to use results from processes then you have to wait on queue. But queue shouldn't block main code when it is empty - it may block only when it has to get() data from queue - and only when data is big. In web pages common problem is that it has to send information to user in 30 seconds (for security reason - see [Denial-of-service attack](https://en.wikipedia.org/wiki/Denial-of-service_attack)) – furas Aug 16 '22 at 15:24
  • some Django pages send task to separated process (using [Celery](https://docs.celeryq.dev/en/stable/) or similar programs) and they send back to user page with progressbar (or other animation) and this page runs JavaScript which periodically asks server if there is new result - and JavaScript updates progressbar. – furas Aug 16 '22 at 15:29
  • In my code, I need to return a dataframe from the process, i have used the suggested code snippet. However, it always blocks my main process. So I am stuck in this page of the application. – AhmadKhoder Aug 16 '22 at 15:36
  • I can't run your code - so I can only suggest to use `print()` to see which part of code is executed and where it is stopped. And when you use `q.get()` then remember to check `q.empty()` - if queue will be empty then `.get()` will be wait for data and it will be block code. – furas Aug 16 '22 at 15:41
  • I already have debugged my code using prints, the problem for me is that i want to avoid queues in my code, because like you have highlighted in the previous comments, approaching the problem sing queues when saving big data can cause problems, and this is my case. So i am trying to fix the problem without using a queue. So it is just a piece of code (several lines) that i want to run, this piece contains a function with a parameter, i want to give that piece of code a period of 30 seconds to return value,if it does not return, modify the parameter of the function However, by avoiding queues – AhmadKhoder Aug 18 '22 at 12:56
  • I don't know your code and I have no idea how it works and what is the real problem. maybe queue is not problem but you need to use it in different way. – furas Aug 18 '22 at 13:04