Transaction count after EXECUTE indicates a mismatching number of BEGIN and COMMIT statements. What to do when coding standards aren't enough?

Question

Our shop has heavy T-SQL stored procedure logic, supporting a web application connecting through JDBC. SQL Server 2008 R2.

From time to time, one of the SQL developers makes a mistake and writes a stored procedure that leaves a transaction open. If you've ever seen this happen in the wild, it's pretty ugly. The connection gets released back to the pool with its rogue, uncommitted transaction. Subsequent transactions seem to be working just fine, but locks are actually accumulating, and work is actually uncommitted. Some period of time later, the blocking gets severe enough that users or monitoring software notices it. It's easy enough to identify the head of the blocking chain and the open transaction, but which procedure left the transaction open, and what can you do about it? At this point the best response seems to be to kill the spid, clear up the blocking, and suffer the consequences.

We have a number of coding standards and practices to prevent this, but a stray RETURN or poorly nested IF block or TRY/CATCH block can cause havoc, usually in some edge case that wasn't fully covered.

The question, for the tl;dr crowd, is this: What techniques are available to ensure that a procedure that starts a transaction doesn't accidentally leave it open on exit, even if the developer makes a dumb mistake?

I realize this is a big ask, but hear me out. We're not looking to prevent any consequences of a coding mistake. We're just looking to contain the consequences to the specific transaction. In short, we'd like to change the behavior of error 266 ("Transaction count after EXECUTE...") to just rollback all transactions. Or even really to commit all transactions. Our fall into the SQL Try/Catch block, so the procedure can respond to the mismatch itself. (If I were king of the world, that would be my choice-- let me catch the SQL runtime error in SQL, and I'll decide what to do.)

Let's say our problem code is in the proc AddProductToRecentlyViewed. Someone recently modified it to do this:

IF @UserName='Grue' RETURN -100;  -- Grue is a jerk.

Grue shows up, looks at some products, doesn't see them in Recently Viewed, and goes to lunch. Slowly, this transaction gets passed to the next web transaction, and everything grinds to a halt. Worse, we have actual purchases that seem to be committed, that are in danger of being rolled back. On the operations side, we pretty quickly know there was an error, but it takes a fair bit of digging, and a fair bit of time, to trace it back to this procedure.

It's one thing to absently break Recently Viewed. It's quite another to bring down the whole application.

We do have a standard try/catch block in SQL, similar to the one recommended here. We have unit tests and code reviews, etc. In fact, the error that got through was caught in code review and testing, and corrected-- but the pre-corrected version got promoted accidentally. My VP says, "We're never going to prevent every single dumb mistake. How can we make the next dumb mistake less painful?"

If we had a way to ensure that the procedure just plain couldn't leave a transaction open, we could suffer the consequences of AddProductToRecentlyViewed not working. But I can't find any configuration value or runtime setting that gives me this effect.

Does anyone know of a way to make this more bullet-proof, besides just not making mistakes in T-SQL code?

Here's a couple of things we're considering:

Implement a standard error handler in the application, to watch for Error 266 and do something. Presumably we'd issue an explicit ROLLBACK from the application error handler, or call a stored proc that just checks XACT_STATE and rolls back and logs. (This is a little challenging for us due to the fact that we don't have a central error handler for DAO's, so it might take some time to add in. Also, we'd really rather enforce the "no open transactions" rule in the database, rather than depending on all calling apps doing it right.)
On one of our most frequently called stored procs, add in IF XACT_STATE<>0 ROLLBACK or IF XACT_STATE<>0 COMMIT to the very beginning of the proc. This is pretty kludgy, but it has the effect of putting constant downward pressure on the transaction count. It doesn't help identify the root cause, and we'd still have a little time window between when the tran is left open, and when this frequent proc shoots it down. But it would make the wound fairly self-healing while we responded to the log and monitors.

Are there other settings or configurations that help detect and prevent this error, or other solutions you've used to contain it?

Have you considered executing `IF @@TRANCOUNT > 0 ROLLBACK` each time a connection is opened? — Dan Guzman, Oct 25 '15 at 17:12
Dan, thanks. We have talked about that-- I'm working with our app team to see if it's feasible and if it fits in the application. This is a ColdFusion application, so I'm fairly certain they just declare a with a datasource attribute that points to a known datasource on the machine. CF handles connection pooling outside of that, so when the connection goes out of scope, CF returns it to the pool and uses it for the next request. (If it didn't do this, the open transaction wouldn't matter.) I don't think there's a place to "initialize" the connection in this way... — SteveSmithSQL, Oct 25 '15 at 19:13
I believe the query would need to be done at the app layer because JDBC (at least the Microsoft driver) relies on the app server to handle to actual pooling. This differs from other some other APIs where the pooling is done by the API itself and the connection cleaned up upon reuse with a sp_reset_connection RPC call. — Dan Guzman, Oct 25 '15 at 20:35

Transaction count after EXECUTE indicates a mismatching number of BEGIN and COMMIT statements. What to do when coding standards aren't enough?

0 Answers0