3

I have a multi process scenario. So multiple Windows application connect to the database from different PCs.

I have a simple construct of two tables:

  • I have a tblSessions where each process has one entry using an autoincrement int Id field
  • I have a tblJobs that references the Id from the tblSession using a cascading delete. So when the session gets deleted, all jobs are deleted to. Also the tblJobs has an autoincrement int Id field Records in the tblSession are updated on a regular basis. creation and deletion of records is a rare event.

The tblJobs is heavily in use. Records are inserted by an INSERT INTO. All larger operations do a TABLOCKX first upon the tblJobs to prevent any deadlock situations. Records are deleted from it, with a simple DELETE given the specific Id of the job. The number of entries in the tblJobs is about 100-5000 records. Each row contains 5 INTs, 1 DATETIME field and 1 UID. So the table is really small. So it is not a mass of records but we have permanent INSERT/DELETE statements running on the table.

We see a problem at the customer site running our software and uses a SQL Server 2008 (Standard Edition). We implemented some test routines and we isolated a real strange effect, that showed up at our customer.

  • We have a test routine that performs a simple SELECT COUNT(*) FROM tblJobs WHERE Data=… to check whether the information we collected are still correct. Usually this query should always return 1!
  • We launch the test at the beginning of a longer calculation in our application, and it reports the desired information and it succeeded to return 1
  • Records are repeatedly inserted and deleted from the tblJobs.
  • Now the program runs for 30secs up to 5min and the test above is repeated. And now the tblJobs records doesn’t show the needed information. The test statement return 0, so there is no record with the desired information. And executing a SELECT * FROM tblJobs just as a next operation doesn’t show the expected record. (Both SELECT statement were not part of one transaction)
  • Due to the missing record, the process terminates. If we wouldn’t do the test, the next statements would just update a record in static table, and delete the record (that is missing) from the jobs table and continue… yes I could ignore the effect but it made me stumble… records should not disappear.

The isolation level should be READ COMMITED. But I saw cases were I see sessions with READ UNCOMMITED. In searching for the solution and a reason for our problems I found this SO link SQL Server: Isolation level leaks across pooled connections

I couldn’t find a a reason that some of the sessions show a permanent isolation level of READ COMMITED. I also can’t say anything about the isolation level, when the error occurs.

I can also say that this only happens when the SQL Server under heavy load. As I can see the server has no memory pressure.

What I did:

  • I produced a complete SQL profiler trace. We could reduce it to a size of only 100000 statements. From the beginning of the process that inserts the entry into the tblJobs and until we found that SELECT COUNT(*) FROM … WHERE … returns a 0 for the desired record.
  • At the moment when the error occurred we had 41 clients running (41 entries in the tblSessions). The tblLocks had approximately 3000 records.
  • In the profiler trace we can’t find any DELETE statements for the missing records. We can find hundreds of updates, inserts and deletes affecting other records. But not this record we are looking for. And of course the tblSession record still exists and other records for the session are also still existing.

Also we created a test scenario with the same needs and software and we were never able to repro the problem.

Making changes to the program and rolling it out at the customer isn’t an easy job. So I can’t provide new information to this fast.

So my question:

Is there a situation, that a SELECT COUNT(*) statement may miss a record, that was previously returned and later it isn’t returned, when the isolation level is READ COMMITED or READ UNCOMMITED? Are there lock considerations that cause a record not to appear in a SELECT statement?

xMRi
  • 14,982
  • 3
  • 26
  • 59
  • 1
    You might get more traction on this if you migrate it to dba.stackexchange.com. Interesting conundrum. – Eric Brandt Sep 13 '18 at 18:07
  • Thanks. I didn't know that. I will try tomorrow! – xMRi Sep 13 '18 at 18:15
  • Also asked this question here: https://dba.stackexchange.com/questions/217588/is-it-possible-that-a-select-statement-doesnt-recognize-a-row-that-was-existent – xMRi Sep 14 '18 at 06:27
  • Second question on dba is no on hold, because of cross post. – xMRi Sep 14 '18 at 09:44
  • Sorry. I really meant “migrate”, not cross-post. https://meta.stackexchange.com/questions/10249/what-is-migration-and-how-does-it-work – Eric Brandt Sep 14 '18 at 10:10

1 Answers1

0

Yes. When you are in Read Uncommitted mode, a query can read data that is in the transaction log that has not been fully committed yet and therefore has some form of lock still. In Read Committed, your Read will request a shared lock and will have to wait if the insert has an exclusive lock. Then you have to consider if you are dealing with row locks, table locks or page locks. Could it be that there are page locks happening? Row A happens to exists on the same page as Row Z, row A is being updated, so you can’t find Row Z until it’s done. Did someone turn on a setting to allow page locks? Also you should know that using With(nolock) in a query puts that part of the query in Read Uncommitted.

Watch this https://m.youtube.com/watch?v=EqfAPZGKifA

Roger
  • 261
  • 1
  • 7