1

I have this loop:

for index, row in df.iterrows():
   process_row(index, row)

where process_row is a method that calls two time an API.

def process_row(index, row):
    print("Evaluating row index:", index)
    question = row["Question"]
    answer = row["Answer"]
    instruct = "..."
    instruct2 = "..."

    try:
        completion = openai.ChatCompletion.create(
            model="gpt-3.5-turbo", messages=[{"role": "user", "content": instruct}]
        )    
        response = completion["choices"][0]["message"]["content"]

        completion = openai.ChatCompletion.create(
            model="gpt-3.5-turbo", messages=[{"role": "user", "content": instruct2}]
        )
        response2 = completion["choices"][0]["message"]["content"]

        .... OTHER CODE ....
    except Exception as e:
        print(e)

I want that if the whole method takes more than 30 seconds for an iteration, it performs this:

min_vote = 10
row_with_vote = row.tolist() + [min_vote]
passed_writer.writerow(row_with_vote)

How can I do so? I tried something with concurrent.futures but I don't see any improvement, but if you want I can add it to the post. I have seen other posts but they make a check after every instruction, while I'm pretty sure that in my case it wouldn't solve as the program gets stuck at a single line. Moreover, what reasons can make the method this slow? Most of the iteration take just a couple of seconds, while sometimes one takes 10 or more minutes so something goes wrong.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Lorenzo Cutrupi
  • 626
  • 3
  • 13

1 Answers1

0

Pulling from this answer, try using the signal package to define a timeout.

import signal

def signal_handler(signum, frame):
    raise Exception("timeout function")
    
def long_function_call():
    while True:
        pass

signal.signal(signal.SIGALRM, signal_handler)
signal.alarm(3)   # Three seconds
try:
    long_function_call()
except:
    print("Timed out!")

So your code could look something like this:

import signal
import time
import pandas as pd
import csv

#dummy function
def process_row(index, row):
    time.sleep(index)
    print(f"Processed index {index}")
    
# dummy data
df = pd.DataFrame(columns=["a"], index=range(10))
    
def signal_handler(signum, frame):
    raise Exception("timeout function")
    
with open("./tmpcsv.csv", "w") as f:
    writer = csv.writer(f)
    for index, row in df.iterrows():
        signal.signal(signal.SIGALRM, signal_handler)
        signal.alarm(5)   # 5 second timeout
        try:
            process_row(index, row)
        except:
            print("Timed out!")
            writer.writerow(row)

Processed index 0
Processed index 1
Processed index 2
Processed index 3
Processed index 4
Timed out!
Timed out!
Timed out!
Timed out!
Timed out!
astroChance
  • 337
  • 1
  • 10
  • I don't know if it's working as it should, I should add my code in the except right? But if I insert there print("Skipping to next row") the program prints only "Timed out" from signal_handler – Lorenzo Cutrupi Aug 15 '23 at 15:50
  • No, your code should be in the `try` block. I've updated the answer for clarity so you'll see the printout is coming from the exception thrown within the loop. – astroChance Aug 15 '23 at 15:57
  • I meant the code I wrote in the separate 3 lines in the end, the one to run when the time is exceeded – Lorenzo Cutrupi Aug 15 '23 at 16:17
  • Yes, you can add additional code to the `except` block that will be executed when the timeout occurs. I'm assuming the code you provided is writing a csv file, my updated answer writes a line to a csv file for each loop that timed out. – astroChance Aug 15 '23 at 16:38