Scripts running concurrently against table in SQL Server

Question

I have identical Python scripts I need to run on multiple servers all targeting the same table on a DB server. The script takes 5-20 seconds to run, and must run every 5 minutes.

Server1 --->  -------------
              |  DB Table |
Server2 --->  -------------

The script looks at a single table that looks like this:

Type | many other fields | DirtyBit  |  Owner
 --------------------------------------------
  X  | ...               | UnUsed    |   NULL
  X  | ...               | UnUsed    |   NULL
  X  | ...               | UnUsed    |   NULL
  Y  | ...               | UnUsed    |   NULL
  Y  | ...               | UnUsed    |   NULL

The script does the following:

Grab all records of type X (in a transaction) where DirtyBit is UnUsed and Owner is NULL.,
Update all the records, set DirtyBit to InUse, and Owner to Server1.
Perform some operations on the data in Python.
Update all the records according to the operations in 3. Set DirtyBit back to UnUsed, and Owner back to NULL

Because the script is running on multiple servers, the DirtyBit/Owner combination works to ensure the scripts aren't stepping on each other. Also, note that each row in the table is independent of all the others.

Question: is this a sensible approach to getting the scripts to run concurrently? Is there anyway the database can handle this for me (maybe changing the Transaction Isolation Level?). Ideally, I want this, if the scripts happen to run at the same time:

Script on Server 1 starts running.
Script on Server 2 starts running, notices that 1 is running, and thus decides it doesn't need to run.
Script on Server 1 finishes, updates all the data.

Which of the four steps do you run within a transaction? All of them in a single transaction, each step in separate transactions or another way? What transaction isolation level do you use? — Michał Kołodziejski, Mar 17 '14 at 08:44

score 2 · Accepted Answer · answered Mar 17 '14 at 09:54

Developing solutions that base on a concurrent access and modification of the data is always a very sensible thing. They're also prone to errors that happen very rarely and are hard to find.

In your case, what you want to do is to, in fact, serialize access to your table, not only updates. That is, allow only one thread (transaction) to fetch the data it needs (where DirtyBit is UnUsed and Owner is NULL) and mark those rows as "used". I'm quite sure that your current solution doesn't work properly. Why? Consider such a scenario:

transaction 1 begins
transaction 2 begins
transaction 1 reads the data from table
transaction 2 reads the data from table - it is allowed to in shared lock mode. It reads the same data as transaction 1 did
transaction 1 updates the table
transaction 2 wants to update the table, but it's blocked by transaction 1 - it holds
transaction 1 commits
now transaction 2 may update the data and commit them

As a result both transactions 1 and 2 read the same rows and your script on both servers will operate on them. You may easily reproduce such a scenario manually operating on the database.

You can avoid it explicitly acquiring exclusive table lock. This would look like this:

begin transaction;

select * from test where DirtyBit = 'UnUsed' and Owner is null (TABLOCKX);

update test set DirtyBit = 'Used', Owner = 'Server1' where id in (...);

commit;

Here, the (TABLOCKX) will cause the other transactions to wait until this transaction commits or rollbacks - they will not be able to read the data. Does this solve your problem?

But... if you can avoid concurrency in this specific case, I'd recommend you to do so (because of the first paragraph of my response).

So I can simply use `TABLOCKX` when I select all the rows at the beginning, and in that case there is no reason to use either the `Owner` or `DirtyBit` to control concurrency, and I can delete those columns. Is that correct? — Jeff, Mar 17 '14 at 15:36
`TABLOCKX` sets the table lock until the end of transaction. That means that if you'll perform time consuming operations within the transaction, the whole table will remain locked for other transactions - they'll wait and may run out of transaction time. I think it'd be better if you fetch the rows and update the table (first two steps in your description) in one transaction, then operate on the data in your script outside this first transaction (it then won't block the access to your table) and after job is done update the table (step 4). — Michał Kołodziejski, Mar 17 '14 at 15:42
Of course if it's okay that the table is locked for a long period of time, then you may resign from having these columns. It depends on the role of this table. — Michał Kołodziejski, Mar 17 '14 at 15:45

score 1 · Answer 2 · edited May 23 '17 at 12:27

1

I wouldn't take the approach you've used here. Home-grown solutions like this tend to be brittle.

This looks like a good problem for a scheduled job, with concurrency controlled via sp_getapplock:

edited May 23 '17 at 12:27

Community

1
1

answered Mar 17 '14 at 00:50

dwurf

12,393
6
30
42

Could you comment/speculate on where you think this particular solution might be brittle? I don't think a scheduled job is a good fit for this case, because the script is doing a lot of work outside of the database. I will definitely give `sp_getapplock` a try, and accept this answer if it seems workable. Thanks. – Jeff Mar 17 '14 at 07:17

Scripts running concurrently against table in SQL Server

2 Answers2