2

I have a multi-file Python Project, of which many of the files make connections to an Azure SQL Database. The project works fine but, for some reason, one of the files stops being able to connect to the database after a while of the application running, and I can see no reason as to why; especially when other connection attempts work fine.

The connection string, for all the connections (so across all the files), is define as the following:

SQLServer = os.getenv('SQL_SERVER')
SQLDatabase = os.getenv('SQL_DATABASE')
SQLLogin = os.getenv('SQL_LOGIN')
SQLPassword = os.getenv('SQL_PASSWORD')

SQLConnString = 'Driver={ODBC Driver 17 for SQL Server};Server=' + SQLServer + ';Database=' + SQLDatabase + ';UID='+ SQLLogin +';PWD=' + SQLPassword

sqlConn = pyodbc.connect(SQLConnString,timeout=20)

And the function I am calling, when the error happens is below:

def iscaptain(guild,user):
    userRoles = user.roles
    roleParam = ""
    for role in userRoles:
        roleParam = roleParam + "," + str(role.id)
    cursor = sqlConn.cursor()
    roleParam = roleParam[1:]
    cursor.execute('EXEC corgi.GetUserAccess ?, ?;',guild.id,roleParam)
    for row in cursor:
        if row[1] == "Team Captain":
            cursor.close()
            return True
    cursor.close()
    return False

The error specifically happens at cursor.execute. I currently get the error

pyodbc.OperationalError: ('08S01', '[08S01] [Microsoft][ODBC Driver 17 for SQL Server]TCP Provider: Error code 0x68 (104) (SQLExecDirectW)')

Previously I didn't have the timeout in the connection on the specific file that was having a problem, and I did get a different error:

Communication link failure

Apologies, I don't have the full previous error.

Other connections, in other files in the same project, work fine, so the problem is not a network issue; if it were none of the connections would work. The problem only happens in one file, where all the connection attempts fail.

Googling the latest error really doesn't get me far. For example, there's a Github issue that gets nowhere, and this question isn't related as connecting works fie from other files.

Note, as well, that this happens after a period of time; I don't really know how long that period is but it's certainly hours. Restarting the project fixes the issue as well; the above function will work fine. That isn't really a solution though, I can't keep restarting the application ad-hoc.

The error is immediate as well; it's like Python/PyODBC isn't trying to connect. When stepping into the cursor.execute the error is generated straight after; it's not like when you get a timeout and you'll be waiting a few seconds, or more, for the timeout to occur.

I'm at a loss here. Why is the file (and only that one) unable to connect any more later on? There are no locks on the database either, so It's not like I have a transaction left hanging; though I would expect a timeout error again then as the procedure would be unable to gain a lock on the data.

Note, as well, that if I manually execute the procedure, in sqlcmd/SSMS/ADS, data is returned fine as well, so the Procedure does work fine. And, again, if I restart the application it'll work without issue for many hours.


Edit: I attempted the answer from Sabik below, however, this only broke to application, unfortunately. The solution they provided had the parameter self on the function validate_conn and so calling validate_conn() failed as I don't have a parameter for this "self". The method they said to use, just validate_conn didn't do anything; it doesn't call the function (which I expected). Removing the parameter, and references to self, also broke the application, stating that sqlConn wasn't declared even though it was; see image below where you can clearly see that sqlConn has a value: enter image description here Yet immediately after that line I get the error below:

UnboundLocalError: local variable 'sqlConn' referenced before assignment

So something appears to be wrong with their code, but I don't know what.

Mogo
  • 99
  • 11
  • Are you using long-lived pyodbc connections, or getting a new one from the connection pool for each request? I don't believe this is an issue for on-prem SQL Server, but Azure was known to have [an idle timeout of 30 minutes](https://azure.microsoft.com/en-us/blog/connections-and-sql-azure/). – AlwaysLearning Mar 09 '21 at 11:33
  • I won't lie, I'm not sure how i tell the difference between a "long lived" one and not, @AlwaysLearning . Could you elaborate at all? How do I know the connection is long lived or not? If it is a "long lived" connection, is there a method to "reset" it? – Mogo Mar 09 '21 at 11:35
  • Long-lived - do you initialise `sqlConn` once at the beginning of the file and then keep it around forever (more than 30 minutes), or do you initialise `sqlConn` frequently (every query or every few queries)? – Jiří Baum Mar 09 '21 at 11:38
  • It's initialised at the start of the file, @sabik . – Mogo Mar 09 '21 at 11:40
  • Right, so if it's initialised at the start of the file and the file then runs continuously, that's a long-lived connection. If the other files either reinitialise it periodically, or only run for a short while before exiting, those would be short-lived connections. – Jiří Baum Mar 09 '21 at 11:45
  • OK, so how do I change it to a short-lived connection, @sabik ? Do I have to repeat the `pyodbc.connect` line prior to *every* time I attempt to access the instance? That feels like awful lot of unneeded repetition of code if so. Do I then need some like of `pyodbc.close` function too after each call to the instance or will the connection automatically be closed "gracefully"? – Mogo Mar 09 '21 at 11:47
  • If it's indeed a 30-minute timeout, you can record the time you've opened the connection, then reconnect if it's been more than 25 minutes. – Jiří Baum Mar 09 '21 at 11:53
  • How would i do that, @sabik ? Are you able to post an answer demonstrating how to reopen the connection after 25+ minutes? – Mogo Mar 09 '21 at 11:55
  • Answer posted; not sure if this is the problem, but if it is, this would be one way of addressing it. – Jiří Baum Mar 09 '21 at 12:14

2 Answers2

2

One possibility being discussed in the comments is that it's a 30-minute idle timeout on the database end, in which case one solution would be to record the time the connection has been opened, then reconnect if it's been more than 25 minutes.

This would be a method like:

def validate_conn(self):
    if self.sqlConn is None or datetime.datetime.now() > self.conn_expiry:
        try:
            self.sqlConn.close()
        except:  # pylint: disable=broad-except
            # suppress all exceptions; we're in any case about to reconnect,
            # which will either resolve the situation or raise its own error
            pass

        self.sqlConn = pyodbc.connect(...)
        self.conn_expiry = datetime.datetime.now() + datetime.timedelta(minutes=25)

(Adjust as appropriate if sqlConn is a global.)

At the beginning of each function which uses sqlConn, call validate_conn first, then use the connection freely.

Note: this is one of the rare situations in which suppressing all exceptions is reasonable; we're in any case about to reconnect to the database, which will either resolve the situation satisfactorily, or raise its own error.

Edit: If sqlConn is a global, it will need to be declared as such in the function:

def validate_conn():
    global sqlConn, conn_expiry
    if sqlConn is None or datetime.datetime.now() > conn_expiry:
        try:
            sqlConn.close()
        except:  # pylint: disable=broad-except
            # suppress all exceptions; we're in any case about to reconnect,
            # which will either resolve the situation or raise its own error
            pass

        sqlConn = pyodbc.connect(...)
        conn_expiry = datetime.datetime.now() + datetime.timedelta(minutes=25)

As an unrelated style note, a shorter way to write the function would be using (a) a with statement and (b) the any operator, like this:

    with sqlConn.cursor() as cursor:
        roleParam = roleParam[1:]
        cursor.execute('EXEC corgi.GetUserAccess ?, ?;', guild.id, roleParam)
        return any(row[1] == "Team Captain" for row in cursor)

The with statement has the advantage that the cursor is guaranteed to be closed regardless of how the code is exited, even if there's an unexpected exception or if a later modification adds more branches.

Jiří Baum
  • 6,697
  • 2
  • 17
  • 17
  • 1
    Thanks, Sabik. I'll give have a look at see how I get on. If this does work, it might be a little before I accept as a solution, as it *can* sometimes take several hours for the problem to arise. – Mogo Mar 09 '21 at 12:16
  • Hmm, I note you state that I still need to put a call to `validate_conn` prior to every attempt to access the database; wouldn't it therefore be better to do just open the connection, and then close it, in the function? What benefit does this method have over it, as it's still a case of adding 10's of lines of code prior to every attempt to access the instance (something i was hoping to avoid). – Mogo Mar 09 '21 at 12:18
  • It's one line of code at the start of each function (calling `validate_conn()`). The advantage is that it only reopens the connection once every 25 minutes, rather than on every use; reopening on every use could be quite slow. However, if performance turns out not to be an issue, you can certainly instead have a function `get_conn()` which opens a new connection each time. – Jiří Baum Mar 09 '21 at 12:24
  • I see. I'll have an experiment and let you know. Many thanks. Again, apologies if you don't hear anything for a while, I've not forgotten to come back and upvote/accept. :) – Mogo Mar 09 '21 at 12:26
  • Also, if I did want to go down the route of opening the connection each time, would I assume I would need a `sqlConn.Close()`? – Mogo Mar 09 '21 at 12:35
  • Doesn't look like this works. Putting in `validate_conn` doesn't call the function, and using `validate_conn()` states I'm missing a parameter. I don't have a parameter to pass? – Mogo Mar 09 '21 at 13:09
  • I removed the references to `self`, including the parameter, and this gave the error *"local variable 'sqlConn' referenced before assignment"* on the line `if sqlConn is None or datetime.datetime.now() > conn_expiry`, so this doesn't solve the problem I'm afraid. Many thanks for trying. – Mogo Mar 09 '21 at 13:13
  • If it's a global variable, it needs to be declared as `global` in the function: `global sqlConn, conn_expiry` – Jiří Baum Mar 10 '21 at 03:06
  • Updated in the answer. – Jiří Baum Mar 10 '21 at 03:10
1

Although the solution for Sabik didn't work for me, the answer did push me in the right direction to find a solution. That was, specifically, with the use of the with clauses.

Instead of having a long lasting connection, as I have been informed I had, I've now changed these to short lived connections with I open with a with, and then also changed the cursor to a with as well. So, for the iscaptain function, I now have code that looks like this:

def iscaptain(guild,user):
    userRoles = user.roles
    roleParam = ""
    for role in userRoles:
        roleParam = roleParam + "," + str(role.id)
    #sqlConn = pyodbc.connect(SQLConnString,timeout=20)
    with pyodbc.connect(SQLConnString,timeout=20) as sqlConn:
        with sqlConn.cursor() as cursor:
            roleParam = roleParam[1:]
            cursor.execute('EXEC corgi.GetUserAccess ?, ?;', guild.id, roleParam)
            return any(row[1] == "Team Captain" for row in cursor)
    return False

It did appear that Azure was closing the connections after a period of time, and thus when the connection was attempted to be reused it failed to connect. As, however, both hosts are in Azure, but the Server running the Python application and the SQL Database, I am happy to reconnect as needed, as speed should not be a massive issue; certainly it hasn't been during the last 48 hours of testing.

This does mean i have a lot of code to refactor, but for the stability, it's a must.

Mogo
  • 99
  • 11