0

I have seen a few posts about this already but have not been able to find a straightforward answer.

I have a fairly basic loop. It runs some SQL for each table name in a list and sends that output to a csv file. With a few thousand tables in the database, there are a few that are just massive, and the query takes forever. In the interest of getting on with life (and since this data isn't super important, I would like my loop to skip an iteration if the time takes longer than a minute.

Here is my loop:

for t in tablelist:
    df = pd.read_sql(sql=f''' select * from [DB].[SCHEMA].[{t}] ''', con=conn)
    df.to_csv(path, index=None)

1 Answers1

1

You should run pd.read_sql in an another Thread, you can use this utility functions:

import time
from threading import Thread

class ThreadWithReturnValue(Thread):
    def __init__(self, group=None, target=None, name=None,
                 args=(), kwargs={}, Verbose=None):
        Thread.__init__(self, group, target, name, args, kwargs)
        self._return = None
    def run(self):
        print(type(self._target))
        if self._target is not None:
            self._return = self._target(*self._args,
                                                **self._kwargs)
    def join(self, *args):
        Thread.join(self, *args)
        return self._return

def call(f, *args, timeout = 5, **kwargs):
    i = 0
    t = ThreadWithReturnValue(target=f, args=args, kwargs=kwargs)
    t.daemon = True
    t.start()
    while True:
        if not t.is_alive():
            break
        if timeout == i:
            print("timeout")
            return
        time.sleep(1)
        i += 1
    return t.join()

def read_sql(a,b,c, sql="", con=""):
    print(a, b, c, sql, con)
    t = 10
    while t > 0:
        # print("t=", t)
        time.sleep(1)
        t -= 1
    return "read_sql return value"

conn = "conn"
t = "t"
print(call(read_sql, "a", "b", "c", timeout=10, sql=f''' select * from [DB].[SCHEMA].[{t}] ''', con=conn))

I got help from this answer.

by those functions:

for t in tablelist:
    df = call(pd.read_sql, timeout=yourTimeOutInSeconds , sql=f''' select * from [DB].[SCHEMA].[{t}] ''', con=conn)
    if df:
        df.to_csv(path, index=None)
S4eed3sm
  • 1,398
  • 5
  • 20
  • This was very helpful. I am trying to figure out how to terminate other functions that are being called inside the function call(). For example: result = call(long_lasting_function, timeout=300, arg1="x", arg2="y") def long_lasting_function(arg1=None, arg2=None): subfunction(arg1) subfunction(arg2) subfunction(arg1,arg2) In this case, if one of the subfunctions take too long and long_lasting_function is killed, the subfunction that is in progress continue to execute. How can I kill all subfunctions as well without having to wrap all subfunctions? – bachree Apr 20 '23 at 16:30